WA | on caren PART 1 
BELL SYSTEM 
TECIFINICAL JOURNAL 


High-Power Lasers and Optical Waveguides for Robotic Material- 2479 
Processing Applications 
Chinlon Lin, G. Beni, S. Hackwood, and T. J. Bridges 


Estimates of Path Loss and Radiated Power for UHF Mobile- 2493 
Satellite Systems 

D. ©. Reudink 
Coding of Two-Level Pictures by Pattern Matching and 2513 


Substitution 
QO. Johnsen, J. Segen, and G. L. Cash 


Data-Transport Performance Analysis of Fasnet 2547 
D. P. Heyman 


Forecasting With Adaptive Gradient Exponential Smoothing 2561 
A. Feuer 


Application of the Minimum-Weight Spanning-Tree Algorithm to 2581 
Assignment of Communication Facilities 
N. A. Strakhov 


Note on the Properties of a Vector Quantizer for LPC Coefficients 2603 
LL. R. Rabiner, M. M. Sondhi, and S. E. Levinson 


Upper Bounds on the Minimum Distance of Trellis Codes 2617 
A. RK. Calderbank, J. E. Mazo, and H. M. Shapiro 


PAPERS BY BELL LABORATORIES AUTHORS 2647 
CONTENTS, NOVEMBER 1983 2661 


THE BELL SYSTEM TECHNICAL JOURNAL 


ADVISORY BOARD 


D. E. PROCKNOwW, President Western Electric Company 
I. M. ROSS, President Bell Telephone Laboratories, Incorporated 
W. M. ELLINGHAUS, President American Telephone and Telegraph Company 


EDITORIAL COMMITTEE 


A. A. PENZIAS, Committee Chairman, Bell Laboratories 


M. M. BUCHNER, JR., Bell Laboratories R. A. KELLEY, Bell Laboratories 
R. P. CLAGETT, Western Electric R. wW. LucKY, Bell Laboratories 
T. H. CROWLEY, Bell Laboratories R. L. MARTIN, Bell Laboratories 
B. R. DARNALL, Bell Laboratories J. S. NOWAK, Bell Laboratories 
B. P. DONOHUE, Ill, AT&T Information Systems L. SCHENKER, Bell Laboratories 
1. DORROS, AT&T G. SPIRO, Western Electric 


J. W. TIMKO, AT&T Information Systems 


EDITORIAL STAFF 


B. G. KING, Editor LOUISE S. GOLLER, Assistant Editor 
PIERCE WHEELER, Managing Editor H. M. PURVIANCE, Art Editor 
B. G. GRUBER, Circulation 


THE BELL SYSTEM TECHNICAL JOURNAL (ISSNO005-8580) is published by the American 
Telephone and Telegraph Company, 195 Broadway, N. Y., N. Y. 10007; C. L. Brown, Chairman 
and Chief Executive Officer; W. M. Ellinghaus, President; V. A. Dwyer, Vice President and 
Treasurer; T. O. Davis, Secretary. 


The Journal is published in three parts. Part 1, general subjects, is published ten times each 
year. Part 2, Computing Science and Systems, and Part 3, single-subject issues, are published 
with Part 1 as the papers become available. 


The subscription price includes all three parts. Subscriptions: United States—1 year $35; 2 
years $63; 3 years $84; foreign—1 year $45; 2 years $73; 3 years $94. Subscriptions to Part 2 
only are $10 ($12 foreign). Single copies of the Journal are available at $5 ($6 foreign). Payment 
for foreign subscriptions or single copies must be made in United States funds, or by check 
drawn on a United States bank and made payable to The Bell System Technical Journal and 
sent to Bell Laboratories, Circulation Dept., Room 1£-335, 101 J. F. Kennedy Parkway, Short 
Hills, N. J. 07078. 


Single copies of material from this issue of The Bell System Technical Journal may be 
reproduced for personal, noncommercial use. Permission to make multiple copies must be 
obtained from the editor. 


Comments on the technical content of any article or brief are welcome. These and other 
editorial inquiries should be addressed to the Editor, The Bell System Technical Journal, Bell 
Laboratories, Room 1J-319, 101 J. F. Kennedy Parkway, Short Hills, N. J. 07078. Comments 
and inquiries, whether or not published, shall not be regarded as confidential or otherwise 
restricted in use and will become the property of the American Telephone and Telegraph 
Company. Comments selected for publication may be edited for brevity, subject to author 
approval. 


Printed in U.S.A. Second-class postage paid at Short Hills, N. J. 07078 and additional mailing 
offices. Postmaster: Send address changes to The Bell System Technical Journal, Room 1E- 
335, 101 J. F. Kennedy Parkway, Short Hills, N. J. 07078. 


© 1983 American Telephone and Telegraph Company. 


THE BELL SYSTEM 
TECHNICAL JOURNAL 


DEVOTED TO THE SCIENTIFIC AND ENGINEERING 
ASPECTS OF ELECTRICAL COMMUNICATION 


Volume 62 October 1983 Number 8, Part 1 


High-Power Lasers and Optical Waveguides for 
Robotic Material-Processing Applications 


By CHINLON LIN,* G. BENI,* S$. HACKWOOD,* and 
T. J. BRIDGES* 


(Manuscript received April 4, 1983) 


For various material-processing applications with robots we propose the use 
of high-power continuous wave and pulsed lasers (Nd**:YAG, Argon ion, COz, 
excimer, etc.) and optical waveguides for delivering high powers in the ultra- 
violet (UV), the visible, and the infrared (IR) regions. We discuss the use of 
low-loss silica glass fiber waveguides for delivering high-power laser beam in 
the UV to near-IR spectral region (0.3 to 2 um), and the use of a waveguiding 
articulating arm for delivering high-power laser beam in the long IR (2 to 10 
pm). We also describe a design for fitting a CO2 laser waveguiding arm to the 
robotic arm, as well as the advantages of using optical waveguides for high- 
power laser delivery to robots for material processing. 


I. INTRODUCTION 


Optical waveguides are known to be useful for optical signal trans- 
mission in which low-power, modulated semiconductor injection laser 
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light is used for lightwave communication applications.’ The advent 
of low-loss optical fiber waveguides, for example, has made possible 
long-distance, high-bandwidth lightwave communication systems for 
transmitting audio, data, and video signals. This paper discusses the 
use of optical waveguides for a different application: high-power laser 
transmission for robotic material-processing applications. Using high- 
power continuous wave (cw) and pulsed lasers and appropriate optical 
waveguides for the ultraviolet (UV), the visible, the near infrared (IR), 
and the longer IR, a robot can manipulate the output beam of a variety 
of high-power lasers for various processing functions. In Section II we 
discuss the available flexible waveguides for high-power laser trans- 
mission. In Section III we describe a manually operated CQO, laser 
waveguiding articulating arm, and in Section IV a design for fitting 
the waveguiding articulating CO, laser arm to a robot arm. 

At present, automation of material processing using high-power 
lasers requires costly, dedicated, large-size equipment. We believe that 
an inexpensive, small-size robot controlling a high-power laser beam 
with the help of optical waveguides will make possible many new 
applications in material processing. 


Il. HIGH-POWER LASER TRANSMISSION IN OPTICAL FIBER 
WAVEGUIDES 


While the use of high-power lasers for material processing is well 
known,”* the use of low-loss optical waveguides for high-optical-power 
transmission is not widely practiced.* For robotic applications (appli- 
cations requiring the unique dexterity and versatility of robots), it is 
essential that the combination of high-power laser technology and 
robotics does not reduce the dexterity or flexibility of the robots. The 
essential element here for providing the flexible link between the high- 
power lasers (usually heavy and bulky) and the robots is the optical 
waveguide. 

Figure 1 illustrates the basic system schematic for using high-power 
lasers and optical waveguides in robotic material-processing applica- 
tions. Depending on the type of high-power lasers, different optical 
waveguides can be used. For example, in the near-infrared region of 1 
to 2 um, e.g., for high-power Nd:YAG lasers at 1.06 um, silica glass 
fibers have excellent transmission characteristics (see Fig. 2). As a 
result of advances in lightwave communications technology, the loss 
in a silica fiber waveguide can be very low (~1 dB/km, or 0.01 dB/10 
m at 1.06 um). In this case the loss due to coupling into and out of the 
fiber waveguide is much larger than the transmission loss for even a 
1-km-long optical fiber. Losses in silica glass fibers also can be low 
enough (for 10 to 100m lengths) for guiding blue-green and red lasers; 
thus such silica fibers are useful for transmission of high-power Argon 
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Fig. 1—Schematic of a system using flexible optical fiber waveguides for delivering 
high-power laser radiation to the robotic arm/hand for various material-processing 
applications. 
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Fig. 2—Loss spectra of a typical low-loss silica glass fiber waveguide. For practical 
robotic material-processing applications, fiber loss of 1 dB/10 m (or 100 dB/km) could 
be considered low loss. 
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ion and Krypton ion lasers, as well as high-power ruby and alexandrite 
lasers. Recently available special UV silica glass fibers* may also be 
used for the ultraviolet wavelength region (0.3 to 0.4 um). The loss is 
about 1 to 2 dB for every 10m, which is still low. Such fibers are useful 
for transmitting UV lasers (e.g., He-Cd lasers and excimer lasers) in, 
for example, photochemical applications. 

Thus we have available appropriate optical fiber waveguides for 
transmitting high-power laser radiation in the spectral region from 
UV to near IR through at least 5 ~ 10 meters. This allows the bulky 
high-power laser head and its high-energy power supply (and cooling 
system, if any) to be separated from the robot, while allowing the 
powerful laser beam to be delivered to the robot arm or fingertip. As 
we saw in Fig. 1 a high-power Nd:YAG laser and silica glass fiberguide 
could be used for guiding the laser radiation to the robot hand 
(gripper). The silica glass fiber can be routed inside the robotic arm 
assembly, or mounted externally but attached to the side of the arm, 
depending on the situation or work requirement. The output fiber end 
can have a microlens (such as a half-pitch graded-index-rod lens) or a 
small conventional lens attached for output beam focusing. 

For transmitting Nd:YAG lasers, ruby lasers, Argon ion lasers (in 
the visible and the near-infrared spectral region), the silica glass fibers 
are typically very small in dimension: outer diameters are on the order 
of a few hundred micrometers to a few millimeters, including the 
protecting jacket or cable. For high-laser-power output with well- 
defined spatial distribution (e.g., for maximum brightness, or best 
focusing), single-mode fibers with appropriate refractive index differ- 
ence An and core diameter 2a can be designed (with normalized 
frequency V < 2.4 at the laser wavelength) for use in these different 
wavelength regions. If maximum overall energy transmission without 
concern for the spatial quality of the laser beam output is desired, a 
large-core, high numerical aperture (N.A.) silica glass fiber can be 
used for high-energy delivery to the robot. For the propagation prop- 
erties and design considerations in single-mode and multimode silica 
glass fibers, appropriate references’ should be consulted. 

For transmitting high-power, longer infrared (2 to 10 wm) lasers 
such as CQ, lasers, a configuration similar to that shown in Fig. 1 can 
be used, if a truly flexible CO laser fiberguide is available. Presently, 
various glass and crystal fibers are being developed for this spectral 
region.” Notably among them are the polycrystalline KRS-5 fibers® 
and the single-crystal AgBr fibers’ for CO, lasers transmission. How- 
ever, presently available long infrared fibers tend to be very lossy and 


* UV fibers with losses in the 150 dB/km range for \ ~ 310 nm range have been 
reported by, for example, Quartz and Silice, France. 
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fragile. Thus the mechanical and optical properties are not yet truly 
satisfactory. Therefore, at present, bulky conventional articulating 
arms (consisting of aligned mirrors) are used for most applications 
requiring some flexibility in CO. laser delivery. To improve the flexi- 
bility and stability, Bridges and Strnad have developed a novel 
“waveguiding” articulating arm for transmitting high-power CO, laser 
radiation.® The arm, shown in Fig. 3, has been designed for manual 
control. It is compact and relatively articulate. In the future, truly 





4 Fig. 3—The Bridges/Strnad waveguiding articulating arm for high-power CO, laser 
elivery. 
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flexible long-wavelength fibers are expected to have lower loss and 
higher strength than those presently available. Until then, the Bridges/ 
Strnad waveguiding articulating arm would be the choice for CO, laser 
delivery to the robot. In Section III we discuss in more detail the 
design of this CO, laser arm; in Section IV we describe designing this 
laser arm to fit onto a robot arm for material processing. 


Ill. BRIDGES/STRNAD WAVEGUIDING ARTICULATING ARM FOR LONG- 
IR LASER RADIATION 
Figure 4 shows the design details of the Bridges/Strnad arm used 
for manual operation (see Fig. 3). This articulating arm uses the 
principles of waveguiding in hollow dielectric tubes. This new arm has 





~~~ DIELECTRIC WAVEGUIDE 


R — ROTATING JOINT 


BALL BEARING--~~ _R” 


Fig.4—The design of the Bridges/Strnad waveguiding articulating laser arm for 
flexible delivery of high-power, long-IR (e.g., COz laser at 10.6 um) optical radiation. 
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a number of advantages over previous articulating arms, including 
compactness and better pointing accuracy when compared with con- 
ventional articulating arms.® Flexible waveguides such as metal 
waveguides” and presently available infrared fibers are problematic 
because of the multimode nature of the guide. Single-mode radiation 
from the laser is rapidly degraded into a multiple-mode pattern that 
changes in form as the guide is moved. The degradation reduces 
considerably the maximum intensity that can be obtained by focusing 
the output radiation. In the case of articulating arms of conventional 
design the single mode is preserved, but unless the input beam is 
launched precisely on axis and the mechanism of the arm is precisely 
correct, the output beam will wander in a complicated manner as the 
arm is manipulated. Such arms are also characteristically large and 
cumbersome. The Bridges/Strnad arm design avoids this problem by 
propagating the radiation in straight, hollow, dielectric waveguides of 
the Marcatili-Schmeltzer type.'! A single mode can be maintained in 
the guide, while the pointing accuracy is far less affected by initial 
launch conditions and accuracy of construction. (Pointing accuracy is 
determined by how closely the direction of the output beam conforms 
to the mechanical axis of the arm.) A further advantage is the compact 
design resulting from the elimination of diffraction spreading of the 
beam, by the guiding action of the waveguide. 

The Marcatili-Schmeltzer waveguide carries radiation in the hollow 
circular bore of a dielectric tube. The dielectric need not be transparent 
to the radiation being guided. The mechanism of guiding can be 
thought of as a continual-glancing-angle Fresnel reflection from the 
dielectric walls. This reflection is not total, but close to 100 percent 
for very shallow incident angles to the walls. The modes of propagation 
have been calculated by Marcatili and Schmeltzer," and they find that 
the lowest loss mode is the EH,; mode. An appropriate waveguide size 
is 50 to 200 wavelengths in diameter. This size is large enough to give 
low loss, but still retain adequate guiding so that straightness of the 
tube is not an important factor, although curvature of the tube axis 
introduces extra loss by an amount that increases with tube diameter. 

Since the dielectric need not be transparent to the radiation, glass 
or quartz tubing which is readily obtainable in precision bore form can 
be used to transport 10.6-ym radiation. Single-mode laser radiation is 
conveniently launched into the waveguide by means of a lens (Fig. 5). 
The focal length of the lens is chosen to closely match the input 
Gaussian beam to the guided beam with small loss.” Short gaps in the 
tube can be tolerated with small loss so that mirrors which turn the 
beam through a 90-degree angle and are basic to the operation of the 
infrared articulating arm can be used in a simple arrangement (see 
Fig. 4). 
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Fig. 5—Launching a free-space Gaussian beam into waveguide by means of a lens. 





As a secondary feature, the glass or other visibly transparent 
waveguide tube can act as a light pipe to carry visible light through 
the arm. This light can be used for illuminating the work area, or for 
aiming the output beam. For this application the mirrors should be 
highly reflective in the visible as well as the infrared range. A suitable 
material is evaporated silver. 

With the above concept in mind, various components of a possible 
waveguide articulating arm were tested in the laboratory, using a 10- 
pm COQ, laser as a source. A 13.9-cm length of fused quartz tubing with 
1.55-mm bore was tested. When a 30-cm focal length lens was used to 
focus the radiation into the guide, a transmission of 93 percent was 
found. To test the effect of small misalignments, the tube was pivoted 
off axis around the input point by one-half of one degree, and the 
transmission dropped by only 2 percent. Finally, a mock-up of a corner 
elbow (see Fig. 4) was made on the bench and a transmission of 95 
percent was measured. This information demonstrated the feasibility 
of the idea and a complete arm was designed and fabricated (see Fig. 
3). The arm contains three sections of waveguide that are 13 cm long 
and three more that are 2 cm long. The six corner mirrors used were 
commercially obtained. They were made from silicon 1 mm thick and 
were coated with silver and a transparent protective layer. The corners 
swivel on precision ball bearings. With a total length of 40 cm the arm 
can access any point in a 80-cm-diameter sphere. The completed arm 
was tested and found to have a transmission of 80 percent. Power up 
to 5W cw was transmitted with no damage. The 1.55-mm diameter 
beam from the output tube was substantially single mode and could 
be focused to a near-diffraction limited spot. As we expected, there 
was no wander of the output beam relative to the output tube as the 
arm was moved. The small size and light weight made it very easy to 
manipulate the arm and to place the output beam in any desired 
position. 
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IV. A ROBOTIC ARM FITTED WITH THE WAVEGUIDING ARTICULATING 
CO, LASER ARM 


The simplistic approach to using the Bridges/Strnad waveguiding 
CO, laser arm is to make the robot gripper hold and maneuver the tip 
of the articulating laser waveguide. This approach, however, has a 
major drawback. Reorientation of the laser-beam output requires, in 
general, rotations of all five revolute joints of the articulated wave- 
guide. In many cases, this complex reconfiguration of the articulated 
waveguide prevents a continuous rotation of the laser-output tip and 
requires the robot to follow a complicated path. 

In addition, a force and torque sensor on the gripper would be 
essential to ensure that the articulated waveguide is not damaged by 
the robot in the attempt of imposing a particular five-link configura- 
tion. This is still beyond state of the art robotics, since even the 
turning of a simple two-link crank by a robot arm is a complex 
compliance problem not yet satisfactorily solved. 

A second approach is to fit the waveguide within or beside the robot. 
Because of the required 90-degree revolute joint articulations, this is 
not a trivial task. In fact, many existing robots have prismatic joints 
and/or unsuitable dimensions. 

We now propose a new robot system consisting of two arms: master 
and slave. The master arm is positioned by motors, whereas the slave 
arm only carries the waveguide. The slave arm is the Bridges-Strnad- 
type five-link waveguiding laser arm with 90-degree rotational joints. 
Unlike the original Bridges-Strnad arm shown in Fig. 2, it now has 
nine (rather than six) mirrors and a different link geometry as de- 
scribed below. The master arm is, for example, a Microbot Alpha* 
whose hand gripper and side casing have been removed. The robot has 
a repeatability of ~250 um and a positioning speed of 50 cm/s. The 
two arms are connected “in parallel” as follows. 

Figure 6 shows schematically the connection between the master 
and the slave arms. The mirrors of the slave arm are labeled ‘b’ to ‘j’. 
Laser input and output are at ‘a’ and ‘k’, respectively. Five mirrors are 
rigid and four (‘b’, ‘e’, ‘g’, and ‘h’) are movable. The axes of rotation 
of the master arm are indicated by rotation angles 6, to 65. Except for 
axis 5, the axes of rotation of the slave arm coincide with the corre- 
sponding axes of rotation of the master arm. For example, a rotation 
63 of the master arm corresponds to an equal angle rotation of the 
slave arm about the direction ‘f’—‘g’. The slave arm direction ‘i’—‘j’ 
does not coincide with, but is parallel to, 6; of the master arm. The 
connection between these two axes is through a pair of identical gears, 
as shown in Fig. 7. The connection between the two arms at the other 


* New industrial-quality product of Microbot, Inc. 
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Fig. 6—Schematic diagram showing the master slave configuration of the CO, laser 
beam positioning robot. 


four axes of rotation is via rigid mounts (not shown in Fig. 6) except 
for axis 3, where a moderately compliant plastic mount is used for 
attaching the two arms. This connection compensates for possible 
slight misalignments between the first four pairs of axes and thus 
prevents damage to the slave arm. 

A precise description of the two-arm assembly is conveniently done 
using Denavit-Hartenberg’® notation, which is standard for robots. 
The five links of the slave arm are defined as follows. The origin is at 
the intersection between axes 6, and 62. Link 1 is segment ‘bcde’; Link 
2 is segment ‘defg’; Link 3 is segment ‘fghi’; Link 4 is segment ‘hij’; 
and Link 5 is segment ‘ijk’. The exact geometry is given in Table I, 
where a; is the twist angle, a; is the ith link length, and d; is the (i-1) 
to ith link distance. These definitions correspond to the conditions: 
1) dg = a3 = ‘hg’ = ‘ef’; 2) ‘hi’ = ‘gf’? = ‘de’ = 2d,; 3) ‘ab’ ‘cd’, ‘ij’, ‘JR’ 
have arbitrary length. 

The transmitted power efficiency should remain high. Extrapolating 
from the 6-mirror configuration, approximately 70-percent efficiency 
is expected. The resolution is determined by the master arm. In our 
case it is approximately 250 um. Note that the slave arm is detachable 
so that the robot can be used for other tasks. 
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Fig. 7—Detail of the gear-connection of the 4th and 5th revolute axes of the master 
arm with the ‘h-i’ and ‘i-j’ revolute axes of the slave arm. 


Table |—Denavit-Hartenberg manipulator parameters for the 
two-arm robot system 


Master Slave 
Link Twist An- Link Link Dis- Twist An- Link Link Dis- 
gle,a;,,in Length, _ tance, d; gle,a;,in Length, tance, d;, 
degrees a;, incm in cm degrees a;, in cm in cm 
1 —90 0 0 ~90 0 10.0 
2 0 17.78 0 0 17.78 0 
3 0 17.78 0 0 17.78 0 
4 +90 0 0 +90 0 1.27 
5 0 0 0 0 0 0 


V. ROBOTIC MATERIAL PROCESSING 


The use of high-power Nd:YAG lasers and CO, lasers for material 
processing such as welding, cutting, drilling, scribing, trimming, heat 
treating, annealing, etc., are well-documented.? Thermally and pho- 
tochemically induced reactions are also well known. The advantages 
of robot-laser-processing of materials are dexterity in robotic-laser- 
beam maneuvering, processing of complex-shaped materials, and ver- 
satility in adapting the changing environments and changing material- 
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processing functions. Existing nonrobotic, dedicated laser material 
processing apparatus”® is much more restricted and expensive to 
modify should work requirement change. The combination of robots 
and high-power lasers is a natural technological direction to pursue 
for more versatile material handling and processing. The various forms 
of optical waveguides we describe here provide the important, and 
maybe indispensible, flexible links between robots and high-power 
lasers. 

The use of these flexible, lightweight optical fiber waveguides (as- 
suming they will also be available at long IR in the near future) for 
delivering high-power laser radiation to robots for material processing 
has several distinct advantages: 

1. The bulky, heavy laser system could be remotely located so that 
any high electromagnetic interference (noise interfering with computer 
signal control of the robot and data transmission) could be eliminated. 

2. The use of lightweight flexible optical fiber waveguides on the 
robot arm (or body) allows laser-material processing in mobile robots 
without undue constraints on their mobility. 

3. Since the work space is not crowded by the use of high-power 
lasers, multiple robots can work together simultaneously in a compli- 
cated laser-material-processing task. 

4. The use of several different kinds of high-power lasers at different 
wavelengths in a single robot can be achieved easily by routing multiple 
waveguides of different types through the robot, with appropriate 
shutters to control the switching of laser beams. 

Our preliminary experimental results show that we can transmit 
(deliver) 5W of cw Nd:YAG laser power to the silica glass fiber output 
suitable for laser soldering. With high-power Nd:YAG lasers and more 
effort in fiber design and coupling, we expect to be able to deliver more 
than 10W (cw) through single-mode fibers and more than 25W (cw) 
through multimode, large-core silica glass fibers. Since the damage 
threshold for silica glass fibers is in the GW/cm? range, pulsed laser 
of high peak power also can be transmitted. With such lightweight 
optical fibers giving out such high-output laser power at the fingertip 
(gripper) of a robot, even a small inexpensive robot can perform many 
complicated material-processing or microprocessing functions. 

With long IR lasers such as CO, lasers and the Bridges/Strnad type 
waveguiding articulating arm, 5W of power has been transmitted. 
Much higher-power (20 to 100W) transmission is expected before 
mirror damage occurs. With future advances in low-loss long IR fibers, 
truly flexible CO, laser transmission at 20- to 40W levels® can be 
expected. 

The positioning resolution and repeatability in robotic laser material 
processing depend on the specific robot design. High positioning 
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precision (10-um repeatability) and high positioning speed (1.6-m/s) 
robots’* are fast becoming available. The use of lightweight flexible 
waveguides for high-power laser delivery to robot arm ensures that 
such high positioning accuracy and speed will not be compromised. 
This is another distinct, significant advantage. 

In summary, the use of appropriate optical waveguides for trans- 
mitting and delivering high-power laser radiation to a robot arm will 
make possible complex robotic-laser-processing of materials. Medical 
and biological applications of robotic microprocessing with fiber- 
guided lasers can also be envisioned. These could be considered a 
special case of robotic material processing. The combination of tech- 
nology of high-power lasers, optical waveguides, and robotics will 
certainly open up a new era of laser material processing. 
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UHF Mobile-Satellite Systems 
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This paper examines the satellite power requirements for land-mobile 
satellite systems, taking into account both shadow and multipath fading. 
Depending upon reliability objectives, present-day satellite capabilities will 
permit from 20 to 200 “(Advanced Mobile Phone Service) AMPS-like” circuits. 
Since satellite systems cost hundreds of millions of dollars, mobile-satellite 
telephony in a conventional sense would be expensive. Techniques are exam- 
ined that in the far-term may permit a factor of 10 increase in capacity while 
still using moderate-size satellites. 


I. INTRODUCTION 


For the next decade or more the new 900-MHz land-mobile systems 
using cellular concepts will be introduced only in large cities. Any kind 
of nationwide service with full coverage will take many years. However, 
there are many nonurban applications where telephone service would 
be highly desirable, including service to vehicles along the nation’s 
interstate highways, to rural residences currently without means of 
obtaining wire-line service, and to aircraft.’* Communication from a 
satellite to small, portable terminals has been achieved, primarily 
demonstrating technical feasibility.*° Other studies examined system 
costs assuming satellite payloads much larger than current capability, 
and paying little attention to propagation effects.°® Although the 
monetary costs looked rather favorable, no such satellites or launch 
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capabilities are expected in the near future. Here, the capacity-per- 
formance trade-offs are examined on realistic assumptions of near- 
term satellite capabilities, considering propagation conditions in some 
detail. 

The issue of cost is not addressed except to mention that a modern, 
one-of-a-kind* satellite, in orbit, might be expected to cost well over 
100 million dollars. At the same time the circuit capacity compared to 
a satellite system with large, fixed ground antennas will be smaller. 

The procedure for the remainder of this paper is first to calculate 
an optimistic link budget based upon the assumption that there is a 
line-of-sight path between the satellite and the mobile system. Follow- 
ing this, estimates are made for the additional losses on the path due 
to obstructions near the mobile system; then an attempt is made to 
determine whether it is feasible to obtain sufficient radiated power in 
the satellite to make up these losses; finally, more spectrum- and 
power-efficient options are examined. 


Il. LINE-OF-SIGHT LINK BUDGET 


The radiated power requirements on the satellite resources will be 
determined by working backwards from the mobile system. Assume 
for now frequency-division multiple access with a minimum carrier- 
to-noise ratio (CNR) of 10 dB in a 20-kHz bandwidth. These numbers 
are roughly consistent with the FM threshold and bandwidth require- 
ments for present-day cellular mobile systems and would correspond 
to something that might be feasible in terms of a digital system in the 
future. These numbers imply 10 log C/kT = 538 dBW/K/Hz. 

Since the mobile system can be driven in any direction, it is 
necessary that the antenna be omnidirectional in the azimuthal plane; 
however, since the satellite is never directly overhead, it is possible to 
form a conical beam in the elevation plane, and obtain a gain of about 
6 dB. Assume a system noise temperature of 400 degrees for the mobile 
system. Although lower-noise receivers certainly could be built, limi- 
tations of man-made noise will prevent the effective use of lower-noise 
receivers; furthermore, the additional cost of ultra-low-noise receivers 
in mobile systems may be prohibitive. These two assumptions imply 
a G/T = —20. With these numbers the required illumination on the 
earth is -134 dBW/m?’. The path loss from the satellite to the mobile 
system calculates to 184 dB. As shown in Table I, the required Effective 
Isotropic Radiated Power (EIRP) per channel is 28 dBW at the 
satellite. 


a Multiple-satellite operation with frequency reuse would be practically impossible 
to achieve because the mobile antennas radiate essentially in an omnidirectional pattern, 
so there is no way to discriminate one satellite from another. 


2494 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1983 


Table |—Link budget for a single mobile-satellite 


channel 
Minimum CNR at mobile 10 dB 
Noise bandwidth 20 kHz 
Mobile receiver noise temperature 400°K 
Antenna gain 6 dB 
G/T —20 
Path loss 184 dB 
Required satellite EIRP 28 dBW 


It is possible to have an antenna with nearly a 15-foot diameter 
within the Space Shuttle. At 900 MHz such an antenna would provide 
an on-axis gain of 31 dB, while the gain at the edge of the country 
would be about 29 dB. We assume on the average that 30 dB of 
antenna gain is available, which implies a required radiated power per 
channel of —2 dBW for Continental United States (CONUS) coverage. 
The average RF power consistent with moderate-size satellites in the 
1980s time frame is about 200W (+23 dBW), which implies 316 
satellite-mobile channels. 


Hl. ADDITIONAL PATH LOSSES 


Excess path losses on land-mobile paths have been measured at 
numerous frequencies over a variety of paths worldwide. A great 
amount of data exists in the 800- to 900-MHz frequency band for 
land-mobile paths, but little data have been published on the losses 
over satellite-to-mobile paths. To estimate what losses might be ex- 
pected, Fig. 1 shows a plot of the median path loss in excess of the 
free space path loss as measured for various base-station antenna 
heights,? plotted in terms of the elevation angle between the mobile 
and the base station. For distances of both 1 and 2 km from the base 
station, the points lie nearly on a straight line on semilog paper. 

A recently published paper indicates that satellite path losses in the 
Denver area (elevation angle 32 degrees) range from 3 to 20 dB over 
line-of-sight.’° The author’s statistical description for excess path loss, 
corresponding to 50-percent large-scale coverage, estimates 9.8-dB 
excess path loss for a suburban environment with an elevation angle 
of 30 degrees. This value is in good agreement with the curve depicting 
a suburban environment plotted in Fig. 1. For more rural locations, 
satellite measurements would predict 5.3-dB excess path loss,*® which 
tends to agree with the straight-line projection of the two data points 
taken 10 km distant from the high-elevation base stations. In all cases, 
due to the more favorable elevation angles, excess path losses are less 
severe on satellite-mobile paths compared to typical land-mobile paths, 
as long as the satellite is located at a longitude such that the slant 
range is not excessive. 
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Fig. 1—Excess attenuation vs elevation angle at 900 MHz. 


For a favorable satellite longitude and for most locations in the 
United States, the elevation angle to the satellite generally will be 
greater than 30 degrees and less than 60 degrees, indicating that it is 
reasonable to expect the median losses over free space to range from 
3 to 10 dB, except possibly in urban areas where greater losses would 
be expected. 

Data reported on land-mobile paths indicate that the distribution 
of the signal about the median is log normal with standard deviations 
ranging from 5 to 10 dB.® The log-normal distribution found in the 
land-mobile case arises from large-scale obstructions such as tall 
buildings and hills, which shadow the line-of-sight path. Intuitively, 
one might expect the variation of signal strength to be less severe on 
a satellite-mobile path, since typical elevation angles for a satellite are 
30 degrees or more, where usual land-mobile paths have elevation 
angles more on the order of a degree or so. Based on the satellite data 
of Ref. 10, the variance of the log normal appears to be somewhat less 
for satellite paths than for land-mobile paths, but not dramatically so. 

Figure 2 is a plot of what might be called the expected range of 
additional losses relative to free space based on available measured 
data.’° It can be seen that high margins are required to provide service 
to approximately 99 percent of the regions of the country. Even 
assuming most regions of service interest are rural, margins in excess 
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of 10 dB are often required. Furthermore, this would assume the 
propagation characteristics of the entire country are similar to those 
in the Denver area. A far safer approach (and perhaps more accurate) 
is to employ the surburban model to represent the small cities and 
towns where the bulk of the demand may be expected. In this case, to 
cover 90 percent of locations would require a signal approximately 16 
dB above the line-of-sight value, and the value would increase to 21 
dB for 99-percent coverage. 


IV. MULTIPATH FADING 


Another factor crucial to system performance is the effect of mul- 
tipath fading. A line-of-sight component can be expected frequently 
on the satellite path. This component plus signal components, which 
scatter into the mobile antenna from nearby objects, produce a Rician 
signal distribution with significantly less fading than occurs with a 
Rayleigh distributed signal. Data on satellite paths confirm Rician- 
like signal statistics.'° For example, level crossing rates can be an 
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Fig. 2—Estimates of signal strength distribution on satellite-mobile paths. 
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order of magnitude or more lower compared to Rayleigh fading. 
Unfortunately, fading will be worst just at the wrong time—when the 
line-of-sight component is obscured and the received signal strength 
is low. Under these conditions the received signal can be assumed 
Rayleigh, and expected performance has been calculated.'! In white 
Gaussian noise, a 10° Bit Error Rate (BER) is attained with an 
average energy per bit (F,) 6.9 dB above the noise density (No) for 
biphase coherent phase shift keyed signals, while in the presence of 
Rayleigh fading an average E,/No of 24 dB is required. Spatially 
separated antennas whose signals are combined in phase can signifi- 
cantly reduce bit error rates. Figure 3 is a plot of BER vs E,/No for 
various numbers of diversity elements. For a 10°? BER with three 
elements, E,/No per element drops to 7 dB, and with eight diversity 
elements it drops to 0 dB E,/No. This somewhat surprising result is 
readily understood if one considers the eight elements as an antenna 
array, whose effective gain is 9 dB higher than a single element. 
Space diversity works well on mobile systems, even with closely 
spaced elements (<1\) that have highly correlated (0.5) signals.° 
However, space diversity cannot be achieved at the satellite because 
the arriving signal is essentially a plane wave, and extremely large 
separation of the satellite antennas would be required. A technique 
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Fig. 3—Error probability of Two Coherent Phase Shift Keying (CPSK) with Rayleigh 
fading and diversity. 
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called retransmission diversity has been suggested for both analog® 
and digital transmission.’” The idea is to transmit the conjugate phase 
of the received carrier on the same or a closely spaced carrier. By 
doing so, the signals from all mobile diversity branches will automat- 
ically arrive in phase at the satellite. In the original thinking for land- 
mobile system use, retransmissions would occur at the base stations, 
thereby simplifying the mobile system. In this instance, the only 
workable method is to place the retransmission apparatus at the mobile 
system. The analog scheme mentioned above would be very difficult 
to build, since the transmit and receive frequencies are closely spaced 
(<100 kHz), and the two signals differ in power by many tens of 
decibels. The digital techniques employ packet transmissions, and any 
hope for compatibility with cellular systems would be lost. 

If the system were not constrained in bandwidth, then similar results 
could be obtained by employing frequency diversity channels. Beyond 
its obvious inefficiency, another drawback with frequency diversity is 
that the satellite must transmit the same signal on multiple channels, 
which requires more equipment and power. Another approach that 
achieves diversity advantage is to employ frequency hopping or spread- 
spectrum techniques.’* This would eliminate the retransmission prob- 
lem of space diversity but with possible reduction in capacity. An 
interesting possibility would be to combine space diversity with spread 
spectrum, using space-diversity mobile reception (because satellite 
power is at a premium) and spread-spectrum mobile transmission to 
combat Rayleigh fading. 

Table II summarizes the per-channel power margins required for 
the various conditions of shadow and multipath fading, as discussed 
previously. We assume that a threshold of 10~° BER is achieved with 
a calculated signal-to-noise ratio (s/n) = 10 dB. This allows 3.1 dB of 
implementation margin for filter and transmission line losses, antenna 
pointing errors, and nonideal detection equipment. The first observa- 
tion from this table is that, without diversity, satellite-mobile com- 


Table 1|—Power margin in excess of free space propagation, 
required to overcome shadow fading, and attain BER < 107? in 
Rayleigh fading 
Margin Above Line-of-Sight (dB) 

Urban Environment Suburban Environment Rural Environment 


Per- 2 8 2 #8 2 8 

cent No’ Branch Branch No_ Branch Branch No _ Branch Branch 
Cover- _— Di- Di- Di- Di- Di- Di- Di- Di- Di- 
age versity versity versity versity versity versity versity versity versity 


50% 31 18 13 27 14 9 22 9 4 
90% 41 28 23 33 20 15 25.5 12.5 7.5 
99% 49 36 31 38 25 20 28 15 10 
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munications in dense urban areas is unachievable for the majority of 
locations. However, it is expected that cellular mobile systems will 
handle this traffic anyway. A margin of 15 dB with two-branch 
diversity would provide service to 99 percent of the rural locations, 60 
percent of suburban locations, but only 25 percent of the urban 
locations. 

Using Figs. 2 and 3, together with the link budget calculated in 
Table I, we can estimate quality of service for a given satellite EIRP. 
Table III gives the percentage of rural or suburban locations where 
performance exceeds the given BER for the indicated per-channel 
satellite transmitter power and for mobile systems with three diversity 
branches. Having only two diversity branches would increase power 
requirements by 3 to 5 dB, while having four diversity branches would 
lower the transmitter requirement by 2 to 3 dB, depending on the 
chosen threshold. 

It is clear that typical satellite configurations are severely power 
limited when it comes to providing mobile services. Previously we saw 
that 316 circuits were available when all mobiles are line-of-sight. 
Permitting 15 dB of margin on each circuit reduces the capacity to 
only 10 circuits. Thus, ways of obtaining more EIRP must be found. 


V. ADDITIONAL SATELLITE EIRP 


It appears obvious that an approximate 10-dB signal-strength mar- 
gin will be required for any reasonable satellite-mobile system. In 
Section II, calculations showed that 200W of RF power were required 
for 316 channels based on line-of-sight propagation. If only 32 channels 
were used, then an additional 10 dB of radiated power per channel 
would be available. However, cost estimates in this paper’s introduc- 
tion indicate that this would be almost certainly a cost-ineffective 
approach. On the other hand, the state-of-the-art cannot provide 2 
kW of RF power in a satellite today; thus, we look to other means of 
effecting higher EIRP or its equivalent. 

The question of efficient multiple-channel satellite transmission is 


Table III—System performance for mobiles with three diversity 
branches 
Percentage of Satellite EIRP/Channel 
Locati i 
With BER 33 dBW 38 dBW 43 dBW 
Less Than OS or _ OO8FODO——"'"" 
Value Rural Suburban Rural Suburban Rural Suburban 
10° 96 45 >99.9 82 >99.9 91 
10% 50 18 97 50 >99.9 78 
10% 12 6 82 35 99.7 65 
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very complex. Given a total payload mass for power amplifiers and 
solar cells and batteries, what techniques best satisfy the system 
requirements? When there are only a few channels, a single-amplifier- 
per-channel operation is usually the simplest. For a large number of 
channels, the hardware complexity of a multitude of amplifiers and of 
multiplexing these RF signals onto an antenna feed necessitates 
another approach. Multicarrier operation (for analog or digital signals) 
simplifies the satellite tremendously but at a cost of power efficiency 
and potential intermodulation distortion. Transmitting digitally mul- 
tiplexed signals from the satellite eliminates intermodulation and 
allows efficient high-power class C amplification, but requires that all 
signals be of the same power and that mobiles have high-speed Time- 
Division Multiple Access (TDMA) receivers. These issues as well as 
other transmission-efficient techniques are addressed in the following 
subsections. 


5.1 Power control 


We note that it is wasteful to provide all vehicles with the 10 dB or 
sO margin to ensure that most of the vehicles have a signal above 
threshold. After all, some vehicles will be line-of-sight to the satellite 
and require substantially less power than those behind a mountain. 
Ideally, just enough power should be made: available to ensure that 
each vehicle has a signal above threshold. This can be accomplished 
by using the technique illustrated in Fig. 4. The automatic gain-control 
signal is applied to a second transmitter carrying the message from 
the ground to the satellite. For illustration, the circuit is shown in the 
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Fig. 4—A technique to reduce shadow fading on mobile-satellite paths. 
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satellite, but it could be located at the earth station just as well if the 
satellite amplifier is linear. Because of the path delays, it could take a 
half-second or so to make a power adjustment. Therefore, the tech- 
nique can be applied against slowly varying shadowing such has hills 
or large terrain features but not against multipath fading. 

Recent work of Yeh and Schwarz allows us to calculate the total 
power expected from the sum of any number of log-normally distrib- 
uted carriers.“ Figure 5 contains plots that show the mean decibel 
value of a log-normal distribution that is derived from the sum of a 
number of log normals with the same mean (0 dB) and standard 
deviations (co = 2.5, 5, 7.5 and 10 dB); these o’s correspond roughly to 
rural, suburban, urban, and dense urban environments. For example, 
the resultant of summing 100 carriers whose standard deviations are 
5 dB is (approximately) a log normal whose mean is 22.5 dB. This 
means that providing a total peak RF power that is 22.5 dB above 
that of a single carrier would satisfy the power demand 50 percent of 
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Fig. 5—Mean decibel values of log-normal distribution derived from log normals with 
same mean and standard deviations. 
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the time when 100 log-normal carriers are individually transmitted. 
Figure 6, similar to Fig. 5, is a plot of the mean plus twice the standard 
deviation of the resultant log normals when a number of identically 
distributed carriers are summed. Permitting an average power of the 
value shown in Fig. 6 reduces to 4.3 percent the time fraction that 
power is not available to meet demand. Compared with the previous 
example of 100 individually transmitted carriers with o = 5 dB, the 
peak power must now be 24.5 dB above that for a single carrier, about 
a 2-dB increase. 

To ensure 95.7-percent coverage assuming a standard deviation of 
5 dB for a single log-normal carrier, requires a margin 10 dB (2c) 
above its mean. Thus, it appears that transmitting each carrier with 
power just sufficient to overcome the path attenuation results in a 
power savings of about 5.5 dB compared to transmitting all signals 
with power 10 dB above the mean. 


MEAN VALUE + 20 OF RESULTANT LOG NORMAL 
FROM SUM OF NV LOG NORMALS IN DECIBELS 
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Fig. 6—Plot of mean plus twice the standard deviation of resultant log normals when 
identically distributed carriers are summed. 
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5.2 Multicarrier considerations 


The calculations above imply that each carrier is transmitted sepa- 
rately. Usually it is far more efficient from the point-of-view of 
spacecraft hardware and complexity to combine signals and transmit 
them from a single high-power amplifier rather than employ many 
(hundreds) of low-power amplifiers. Since the signals add in voltage, 
the peak power requirement will be considerably above the average; 
and since amplifiers are not ideal, intermodulation results. Calcula- 
tions of intermodulation distortion for the case of equal-amplitude but 
randomly phased signals for both ideal and typical power amplifiers 
have been made by Saleh.’ Typical results are shown in Fig. 7. To 
achieve the minimum acceptable C/(I + N) of 10 dB requires that the 
average power of an ideal amplifier be “backed off” 3 dB from its peak 
power and that any realistic amplifier be backed off 3.5 dB from its 
peak power point. For more typical operating conditions, C/(I + N)> 
15 dB, back-offs would be 5 to 7 dB, depending on the amplifier and 
compensation used. This implies that, even for a power amplifier 
which is 50 percent efficient at saturation when operated in the linear 
region, the dc-to-RF efficiency can only be about 12 percent. Still 
there can be an overall weight savings in the satellite compared to 
using individual amplifiers, but quantitative calculations are beyond 
the scope of these considerations. 

The power back-off numbers cited are calculated for equal-ampli- 
tude carriers with random phases. For the mobile-satellite case with 
power control, the carriers have independent randomly distributed 
phases, but the amplitudes are log-normally distributed. Greenstein 
has reasoned that similar results should be expected for this case; 
however, the calculations remain to be done.* 


5.3 Resource sharing and coding 


Resource sharing has been suggested for TDMA systems as a means 
to increase a link margin by nearly 10 dB. The idea is to assign the 
user in a shadow fade a longer time slot, and encode the signal. For 
digital mobile-satellite service, this technique could be considered, but 
for analog modulation its implementation is less obvious. For resource 
sharing to be effective, it is necessary that the majority of mobiles not 
require the use of the additional resource of a coded signal. For a given 
system bandwidth, the cost of resource sharing is a reduction in 


* Simulations of L. Greenstein and A. Saleh have shown that for sample cases with 
as few as 100 log-normal carriers with random phases, the resultant envelope tends to 
be Rayleigh distributed. Thus, at least over time periods where the amplitudes of the 
carriers, and thus the average power, can be considered fixed, the calculated results 
based on equal amplitude carriers should be usable. 
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Fig. 7—Back-off required to achieve a given C/(I + N). 


transmission rate, but at an increase in margin. For 12/14 GHz fixed- 
satellite systems, the intent is to apply resource sharing only to those 
stations experiencing rain fading. Even using a rate 1/3 code (i.e., 
three transmitted bits per information bit) for these users, it was 
estimated that the loss in throughput is only a few percent, since very 
few users need resource sharing simultaneously.’® As shown in Fig. 8, 
the throughput drops dramatically when the fraction of simultaneous 
users of resource sharing becomes significant. The curve in Fig. 8 
assumes a fixed total system bandwidth and a rate 1/3 code for users 
of resource sharing. For example, when 10 percent of the users need 
resource sharing, the total number of users decreases by 17 percent 
while, if 50 percent of the users need resource sharing, the system 
capacity drops to 50 percent of the original value. To ensure that at 
any one time only a small percentage of the mobiles require resource 
sharing implies that the system normally operates with a margin 
somewhat above the median excess path loss. From Fig. 2 we see that 
(depending upon the degree of optimism), providing somewhere be- 
tween 6 and 12 dB extra power over free space propagation would 
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Fig. 8—Capacity loss with resource sharing using rate one-third code. 


ensure that fewer than one third of the mobiles would require resource 
sharing at any given time. The additional margin obtainable from 
resource sharing then can extend the range of coverage from approx- 
imately two thirds of all locations to more than 90 percent. As in the 
case of power control, resource sharing cannot be applied instanta- 
neously because of the time delays involved from the time measure- 
ment to the application of coding. Thus, the application is only for 
cases of slowly varying changes in signal strength. 

The original resource sharing concept assumed that all users shared 
a single wideband channel. Implementation in a frequency-channelized 
system is less straightforward. However, if the system is not bandwidth 
limited, then all channels could use, say, a rate 1/3 code, thereby 
gaining an advantage of roughly 4 dB, plus or minus a decibel, 
depending upon the constraint length of the code and the particular 
implementation of the decoder.'” Since the channel bandwidth is now 
three times wider, the mobile receiver is degraded by -5 dB, thus the 
large apparent gains of resource sharing cannot be realized. 


5.4 Trade-offs between radiated power and antenna gain 


There are two ways to increase effective radiated power from a 
satellite. The first is simply to increase the transmitter power. The 
second is to increase the antenna gain. Although the techniques are 
equivalent in terms of the radiation to a point on earth, each imple- 
mentation has different ramifications. Increasing radiated power is 
relatively straightforward in that the coverage area remains the same, 
and the burden on the satellite is to obtain more dc power through the 
use of more solar cells and to provide more battery power for eclipse 
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operation. Note that the eclipses only occur at night when, presumably, 
usage and power requirements would be greatly reduced. 

Normally the satellite-mobile service would not be a broadcast mode, 
that is, it would not be necessary to talk to more than one mobile 
system over a single channel; therefore, the information to that mobile 
system can be confined to a small antenna beam. Thus, having a large 
number of high-gain beams covering the entire country becomes power 
efficient, since the power is only radiated in the beam intended for 
that mobile.!”° An additional benefit of spot beams is the possibility 
that the channels can be reused between areas that are spatially 
separated by a few beamwidths. In the case of conventional point-to- 
point telephony, demand is spatially nonuniform and highly peaked. 
If such is the case here, the advantages of reuse cannot be fully 
realized. 

In a recent note it is proved for idealized constraints that maximum 
EIRP is obtained when the communications payload of the satellite is 
divided equally between the RF power subsystem and the antenna 
subsystem.'® Sample calculations indicate that at frequencies above 1 
GHz, multibeam antennas are more effective in increasing EIRP 
compared to using United States coverage antennas and high-power 
amplifiers. Earlier it was stated that a satellite providing 53-dBW 
EIRP (30-dB antenna gain, 200W RF power) could be achieved fairly 
conveniently. With battery backup either reduced or eliminated, 55 
dBW should be attainable in an advanced state-of-the-art satellite 
using a United States coverage antenna. 

Plotted in Fig. 9 are two curves that show EIRP as a function of 
payload mass. The lower curve is for a satellite with a United States 
coverage antenna where EIRP is increased only through increased 
transmitter power. The upper curve is the case where EIRP is maxi- 
mized by dividing the payload equally between the antenna and the 
transmitter subsystems. At the point where a satellite with a United 
States coverage antenna provides 55 dBW, the maximum EIRP avail- 
able is 58.2 dBW, assuming gain is achieved at 50 times isotropic per 
kg, and RF power is produced at 2.5 W/kg.* Under these conditions 
the antenna would weigh 73 kg and have a gain of 35.5 dB, implying 
three or four zone beam coverage of the United States, and a total RF 
power of approximately 180W would be transmitted. 

Use of spot beams for EIRPs near 55 dB can generate an EIRP 
increase of about 3 dB. At 900 MHz the antenna diameter is already 


* The units here are somewhat unusual, but for a given frequency, antenna gain is 
proportional to the antenna surface area that weighs so many kg/m?. Likewise, power 
is derived from solar cells producing so many watts/kg. Thus, both antenna gain and 
RF power can be expressed as functions of mass. 
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Fig. 9—Effective radiated power as a function of payload mass. 


around 8m to provide a gain of 35.5 dB. For large payload satellites 
with EIRPs of 65 dBW, over 3 kW of RF power would have to be 
transmitted if United States coverage antennas were used, while use 
of a spot beam antenna with maximized EIRP permits 65 dBW of 
radiated power with one quarter the payload mass. Still, the required 
payload mass is 300 kg, twice the size of most current-day satellites. 


5.5 Source coding and narrowband modulation 


Although obvious, it should be mentioned that employing narrower 
band channels improves the predetection CNR. Although modulations 
such as companded Single-Sideband (SSB) have been demonstrated, 
their performance in multipath environments will be degraded. Like- 
wise, low-bit-rate voice coders (10 kb/s) may provide reasonable voice 
quality, but such coders are complex, and whether acceptable perform- 
ance can be achieved with channel errors is not known. However, 
since the prospects for compatibility of satellite transmission with 
present-day land-mobile radio look dismal, the possibility of using 
other techniques to gain perhaps as much as 6 dB in link budget 
compared to analog FM need to be investigated further. 


5.6 Summary of possible improvements in link budget 


Since it is envisioned that much more than a thousand channels are 
necessary for any practical system, the single-amplifier-per-channel 
approach is not feasible. The next most straightforward technique is 
to use multiple carriers on a single (or few) wideband channel. This 
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does not rule out compatibility with FM cellular systems, but major 
system problems remain. 

Multicarrier operation should result in increased payload, and thus 
higher potential EIRP, but actual increases are difficult to calculate. 
Back-off can be eliminated if the downlink signals are digital and 
multiplexed onto a single carrier; however, downlink power control 
cannot be achieved under this condition. Mobiles would require TDMA 
receivers, and for bandwidths greater than 1 MHz, reception may be 
impaired by multipath propagation. 

Table IV lists techniques that can potentially increase or decrease 
effective radiated power. For cellular-like service using a present-day 
satellite, techniques D, E, F, and G can be applied, yielding possible 
increases in EIRP from 0 to 8 dB. With SSB an additional 6 dB 
advantage may be possible (Item C). With digital modulation Item A 
is added and B replaces C, for a link-budget gain of 3 to 15 dB. With 
digital multiplexing, D and E are eliminated and H can replace A, for 
possible link-budget increases of 11 to 19 dB. Finally, using more 
diversity elements (Item I) helps the link budget significantly, espe- 
cially at low BER. 


VI. CAPACITY CALCULATIONS 


Satellite design is very complex, and no claim is made that actual 
satellites can be designed with the calculated capacities. Rather, the 
purpose here is to determine the effect in a general sense to some of 
the many options available. To that end, we make the simplifying 
assumption that changes in capacity are proportional to changes in 
effective radiated power. Thus when bandwidth is not a constraining 
factor, the number of circuits is determined by the simple relationship, 


C= cao) 


where Cg is the baseline capacity calculated in on a line-of-sight basis 


Table IV—Possible improvements in link budget 


Technique dB Increase 
A Channel coding (Sec. 5.3) 2to 4 
B Source coding (Sec 5.5) lto 3 
C Modulation (receiver bandwidth) (Sec 5.5) 3to 6 
D Power control on each channel (Sec. 5.1) 3to 5 
E Back-off loss (Sec. 5.2) -—7 to —4 
F Maximize EIRP (Sec. 5.4) 2to 4 
G Reduced battery for eclipse (Sec. 5.4) 2to 3 
H Resource sharing (Sec. 5.3 6to 9 
I Diversity elements (3 to 8) (compared to 2) (Sec. 4.0) 5to 8 
J Double-size payload (Sec. 5.4) 3to 6 
K Four-times payload (Sec. 5.4) 6 to 12 
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in Section II, Gy are the gains in the link budget discussed in Section 
V, and M is the margin determined in Section III for a given grade of 
service and terrain type. 

For example, assume a rather low grade of service with only a 7-dB 
margin. Combine this with the most optimistic link-budget gain of 19 
dB. Then the possible number of circuits is 316 < 10!” = 5008, and a 
300-MHz spectrum allocation would be required. More realistically, a 
10-dB improvement in the link budget might be obtained for a specially 
designed satellite system not compatible with current cellular systems. 
On the other hand a 10-dB margin is almost essential for good service. 
Taken together the 10-dB improvement is offset by a 10-dB margin 
and the capacity calculates to the baseline value of about 300 circuits. 


VII. DISCUSSION AND CONCLUSIONS 


It should be noted that use of diversity and tolerance of more bit 
errors can lower margin requirements significantly. For example, from 
Fig. 3 we see that, at 10°? BER, going from two to four diversity 
elements reduces the margin requirement by 7 dB. Also, using two- 
branch diversity, but setting the system threshold at 10°? BER instead 
of 10°? BER, reduces the system margin by over 5 dB. The combination 
of the four-branch diversity at 10°? BER threshold permits a 9-dB 
reduction in satellite EIRP compared to two-branch diversity and a 
10-° BER threshold. If this power savings could be directly traded for 
capacity, then eight times the number of circuits could be achieved. 
Downsizing the baseline calculations for low-bandwidth applications 
such as paging or emergency telephony is also possible. 

As noted in Table IV, there is the potential of economy of scale. 
Satellite costs tend to run nearly linearly with weight’’, but EIRP can 
increase with the square of satellite mass;’® and provided there are no 
bandwidth constraints, channel capacity can increase in direct pro- 
portion to EIRP. Thus, for satellites of twice the size (and at least 
twice the cost) four times as many circuits are obtained. In-orbit 
satellite mass as high as 5,000 kg are envisioned using the shuttle/ 
Centaur. This represents a factor of 5 to 10 compared to present-day 
technology and suggests that future land-mobile service via satellite 
could become attractive. Trading power for capacity comes at the 
expense of bandwidth, a very precious commodity. On the other hand, 
terrestrial cellular systems will reuse frequencies hundreds of times 
nationwide. Making satellite-mobile systems spectrally efficient 
through the use of multibeam satellite antennas that can reuse fre- 
quencies is a tremendous technical challenge. Mile-diameter antennas 
are needed to get cell sizes comparable to terrestrial radio systems. 

For service to aircraft, a 6-dB antenna gain for an aircraft in level 
flight seems reasonable, thus, line-of-sight capacity numbers apply 
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directly. For service to residences, an antenna gain of 16 dB is readily 
obtainable using a 1-m dish. Assuming there is a line-of-sight path 
and that there is sufficient bandwidth available, about 3000 circuits 
should be obtainable with present-day satellite capabilities, before 
power limitations become constraining. 

Finally, it is safe to conclude that (1) cellular-compatible satellite- 
mobile systems are highly unlikely to be developed in the near future, 
(2) systems with modest-coverage objectives using enhanced-capability 
satellites and high-performance mobile sets look marginally attractive, 
and (3) very large satellites offer a possibility for mobile systems, in 
the long-term. 
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A pattern matching approach is proposed for coding of two-level pictures. 
Patterns, which are either symbols such as characters, or fractions of black 
' regions, such as line segments, are extracted from the facsimile. They are 
compared and matched to already transmitted patterns, called library patterns. 
If a correct match is detected, only the position of the pattern and the 
identification of the matching library pattern are transmitted. If a pattern 
does not match any library pattern, it is added to the library and its binary 
description is transmitted. Compared to conventional two-dimensional codes, 
the compression is often doubled and is sometimes 4.5 times higher. Compared 
to a symbol-matching coding technique,” the compression has increased by 20 
to 80 percent, depending upon the document. 


I. INTRODUCTION 


Conventional two-level picture coding techniques are based on the 
statistical dependence between neighboring picture elements (pels).’ 
The calculation of entropies, according to a local source model, gives 
the maximum achievable bit rates. Run length or predictive coding 
techniques or a combination of them takes advantage of the statistical 
dependence between neighboring pels and leads to bit rates close to 
the entropy. Each exploits what can be called the microscopic (pel) 
properties of a facsimile. 

Pattern-recognition coding techniques exploit macroscopic proper- 
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ties of the facsimiles. The image source is a source of patterns such as 
characters, lines, and black spaces. We can code the facsimile more 
efficiently, since the description is closer to the perceptual level. We 
can consider two kinds of pattern-recognition coding techniques. The 
first technique is pattern (or image) understanding. It recognizes a 
certain pattern, for example a letter, that possibly includes some font 
information. The second technique is pattern matching. Here, a pat- 
tern is not recognized, but is simply matched with already transmitted 
patterns, and if a correct match is detected, it is replaced by the 
matching pattern. It does not use the image-understanding level. The 
image-understanding approach has the potential advantage of a very 
high compression, but the often important aesthetic details of the 
documents can be lost, and there is a risk of errors at the present level 
of such techniques. The matching approach yields lower compression, 
but keeps more of the original pictorial information. There are also 
lower risks of errors, since matching allows only slight modifications 
in the pattern shapes. Naturally, neither of the pattern-recognition 
techniques is lossless, since they modify the picture content. 

Ascher and Nagy’ and Pratt et al.” have already proposed facsimile 
coding techniques using matching techniques. In the system presented 
here, not only the symbols, as in Pratt’s case, but also graphical 
elements such as line segments and black regions are matched. The 
patterns are efficiently coded and updated, leading to significantly 
higher compressions. 


Il. SYSTEM DESCRIPTION 


Figure 1 shows the block diagram of the system. The pattern locator 
examines the facsimile line by line. When it locates a black pel, the 
pattern isolator picks up a pattern. The pattern is either a symbol 
(defined as a set of black pels completely surrounded by white pels) 
or, when no symbol can be extracted, a fraction of the black region. 
Therefore, contrary to Ref. 2, there is no residue to be coded, since all 
black pels belong to a pattern. 

The matcher makes a template matching of the incoming pattern, 
with existing library patterns to determine whether the incoming 
pattern is similar to an already transmitted pattern. The system 
screens the library patterns to reduce the time-consuming template 
matching. Thus, we consider only the patterns that might match the 
incoming pattern. We screen by comparing features of the library 
patterns with those of the incoming pattern. We apply a very efficient 
and simple two-pass screening. If a correct match is detected, the 
matcher sends the information about the position of the pattern and 
its library identification to the coder. If no match has occurred, the 
incoming pattern is added to the pattern library. The pattern library 
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Fig. 1—Block diagram of facsimile coder by pattern matching. 
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is empty at the beginning of the coding and is gradually built up by 
the incoming library patterns. The matcher then also sends the infor- 
mation about the position and description of the new library patterns 
to the coder. 

A library update and management unit takes care of the addition 
and deletion of library patterns and organizes them for the quickest 
possible match and most efficient coding. All the patterns isolated 
along one line are stored in the coder. When the end of the line is 
reached, we sort the patterns, which allows a more efficient coding. 


Il. LOCATION AND ISOLATION OF PATTERNS 


Patterns, in the present context, are the primitive elements of the 
coding process. They are isolated, and sent to the matching block 
sequentially, in a raster order. We distinguish two classes of patterns, 
relative to a square window of a predetermined size, W. 

1. A symbol is defined as a connected region consisting of black pels 
and completely surrounded by white pels, such that it can completely 
fit into the window. 

2. A nonsymbol is defined as a windowed portion of a black con- 
nected region that is larger than the window. 

Usually, characters and small graphics elements can be represented 
as symbols, while lines and larger figures can be decomposed into 
nonsymbols. The decomposed figures can be later reconstructed by 
taking the union of the nonsymbols. The nonsymbols do not have to 
be disjoint, and a better compression may sometimes result from a 
decomposition into overlapping symbols. 

Decomposing large figures into nonsymbols allows us to use match- 
ing techniques to compress graphical information, as well as text. A 
figure can be decomposed in many ways, and the compression that 
results from grouping similar nonsymbols usually depends on the 
decomposition. The final compression, or the number of different 
classes of nonsymbols, can be used as a measure of quality of the 
decomposition, and one may try to find the best decomposition in 
respect to such measures. Finding the optimal decomposition, however, 
may be computationally quite complex (we do not know of any related 
study) and it would certainly require many passes through a figure. At 
present, we use a one-pass isolation procedure, which allows us to 
keep the computation within reasonable bounds. 

The isolation procedure repeatedly isolates and removes the upper- 
left portion of a black region, up to a maximum size allowed by the 
window. If the isolated pattern has no black pel extensions, then it is 
a symbol; otherwise it is a nonsymbol. 

The isolation algorithm operates on a two-dimensional one-bit array 
containing the original picture. The picture memory is scanned line 


2516 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1983 


by line from the upper-left element. When a black pel is found, the 
procedure attempts to trace the boundary of a black region, clockwise. 
The tracing algorithm is a standard one; however, we describe it here 
for further reference. Let us call the first black pel (x:,y,). The 
neighbors (adjacent pels in eight directions of (x;,y1) are being exam- 
ined, beginning at (x,+1,y,) and searching clockwise around (x;,y,) up 
to (x;—1,yi+1). If a black pel is found, it becomes the second pel of 
the contour — (xX2,y2); otherwise (x:,y1) is erased from the picture 
memory (single pels are neglected) and the scan continues. Each 
subsequent pel of the contour is found by searching around the current 
pel (x;y;), beginning two steps clockwise from the previous pel 
(x;-1,Yi-1) (Fig. 2). The contour trace ends when it returns to the first 
pel in such a way that the next pel would by (x»,yo). The tracing 
algorithm checks for the limits of the picture array and it maintains a 
window. Pels beyond the limits of the picture array and those outside 
of the current window are always treated as white (0 valued). The 
purpose of the window is to restrict the maximal size of isolated 
pattern to W X W. The window is initially set to a size 2W Xx W, and 
positioned in such a way that (x;,y) is in the center of its upper edge. 
When the traced part of the boundary reaches a width of W, the 
window is reset to a size W X W, and it is placed over the boundary 
part that has been traced, such that (x,y;) is still at the upper edge of 
the window (Fig. 3). 

The tracing of the boundary is recorded in a two-dimensional one- 
bit array S in the following way. When the search around the current 
boundary pel (x;,y;) goes past the pel (x;+1,y;), a 1 is put in S(x;+1,y)). 
If the search goes past the element (x;—1,y;) then a 1 is put in S(x;,y;). 
All the elements of S are initially set to 0. The information in S (Fig. 
4), after the trace termination, completely represents the boundary (it 
is a form of run-length code). The pattern now can be isolated by 
copying and erasing the portion of the picture that is enclosed by the 
boundary (including the boundary). This is accomplished using the 
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Fig. 4—Contour encoding in array S. 


information in the array S. For any row of S, let S;, S2 --- S, be the 
position (x-coordinates) of 1-valued elements in a row. The number n 
is always even, which is a property of the boundary encoding that we 
use. For every row of S, the pixels of a corresponding row of the picture 
memory between S, and Sg, S3 and Sy, etc., are copied to another array, 
and set to 0 in the picture memory, including S,,S3 --- and excluding 
S2,S4, --- . The pattern is now isolated and erased from the picture 
memory. While the isolation algorithm described above always works 
correctly, i.e., it isolates symbols and completely decomposes large 
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figures into nonsymbols, it does not attempt in any way to optimize 
the decomposition, so the results are not always pleasing. 

To improve the decomposition in cases commonly occurring in 
graphics, we have added two extensions to the basic isolation scheme: 

1. L-pattern suppression 

L-pattern suppression improves the segmentation of large blobs that 
otherwise may generate many dissimilar nonsymbols (Fig. 5). This 
extension is implemented in the tracing phase of the isolation algo- 
rithm as follows: If the beginning part of the traced boundary goes 
straight down from either first or second pel over more than k (cur- 
rently k = 10) pels, then an attempt to turn immediately to the right 
resets the lower edge of the window to the last pel before the right 
turn, so the boundary is forced to turn left (see Fig. 6). 

2. Cross decomposition 

If the isolated pattern can be represented as an intersection of a 
horizontal and a vertical line segment (a cross), then each segment 
becomes a separate pattern. This is implemented by comparing each 
isolated pattern (with the matching technique described in Section 4) 
to a cross formed by secting this pattern with vertical and horizontal 
lines one pel from the edges of the final window (Fig. 7). If a sufficiently 
close match is found, then one of the line segments from the cross is 


(a) (b) 


Fig. 5—Improvement in segmentation due to L-pattern suppression. (a) Before 
suppression. (b) After suppression. 
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Fig. 6—L-pattern suppression. When tracing reaches the corner, it is forced to follow 
the dashed line. 
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returned to the picture memory, while the other replaces the isolated 
pattern. This extension reduces the number of patterns generated by 
line crossings in grids and tables (Fig. 8). 

The basic isolation algorithm is similar to the region extraction 
method of Dudani‘*, but in contrast to the latter it does not need to 
store and process a list of boundary points, and it extracts regions 
containing holes in one pass. This algorithm can be shown to work 
correctly in every case and it is well suitable for a hardware imple- 
mentation. The extensions of the basic algorithm are heuristic in 
nature, but they improve considerably the decomposition of large 
regions. Examples of such improvements are shown in Fig. 5 and 8. 
Additional improvements may be possible at some increase of the 
computational cost. 


IV. MATCHING 


The matching includes all the processes necessary to know whether 
an incoming pattern matches any of the library patterns. In this 
system, we divide the matching into three parts. 

1. The screening unit makes a selection of the library patterns, and 
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Fig. 7—Forming an intersection pattern. 


Tit ++ L+h 


(a) 


(b) 


Fig. 8—Patterns resulting from grid segmentation. (a) Before cross decomposition. 
(b) After cross decomposition. 
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directs for template matching only those library patterns that might 
match. 

2. The template matcher creates a new binary picture called error 
picture, containing black pels or 1’s in the locations where the two 
template-matched patterns are dissimilar. 

3. The matching decision process uses the error pictures and other 
information to decide whether a correct match has occurred. 


4.1 Screening 


The purpose of the screening is to reduce the time-consuming task 
of the matcher. It should direct to the template matcher only the 
library patterns that might match the incoming (unknown) pattern. 
The screening is obtained by measuring some characteristics of the 
patterns, called features, and comparing them. The features must be 
easy to compute and compare, and also must form an easily classifiable 
space. The digitization of a facsimile adds much noise to a pattern. To 
get an efficient screening, the features must also be relatively noise 
independent. Four features were chosen for the screening. Two of 
them are obvious: the pattern length and the pattern height. The two 
others are the number of horizontal and the number of vertical white 
runs enclosed in the pattern. They are characteristics of the inside of 
a pattern, separating, for example, c from e or o. The chosen features 
are shown in Fig. 9. The straightforward feature “number of black 
pels” was found to be of little use because of its high variability and 
dependency upon the other features. 

The screening process also must decide in which order to send the 
library patterns to the matcher. The most probable match should be 
sent first, to reduce the number of matches. The probability of a match 
between patterns depends not only on the similarity of their features, 
but also on the probability of occurrence of a library pattern. For 
example, an incoming pattern having the same feature distance to an 
O and a Q is much more likely to match the O than the Q since O is 
much more frequent than Q. The screening takes into account both 
the feature similarity and the probability of occurrence of a library 
pattern. We consider the probability of occurrence by sorting the 
library patterns according to the number of times they have matched 
(see Section 5.2.1). We take the feature distance into account by 
allowing for each feature only a fixed margin between the two patterns. 
The margin must be wide enough not to preclude any correct match 
and tight enough to reduce the number of template matches. A two- 
pass screening was found very efficient. In the first screening, only 
library patterns with features very similar to those of the incoming 
patterns are sent to the template matcher. A second, much looser, 
screening is applied only in the few cases where no match occurred. 
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Fig. 9—Features chosen for screening. Horizontal lines indicate which runs are 
included in count of horizontal runs. Vertical lines indicate which runs are included in 
count of vertical runs. Horizontal run count is six and vertical run count 15. 


The screening and the sorting are very efficient in reducing the number 
of matches. For example, for a typewritten document, the average 
number of matches per incoming pattern is reduced to 2.5, compared 
to 25 without screening and sorting. 


4.2 Template matching 


The template matcher creates a new picture called error picture, 
which contains 1’s in the locations where the two patterns are differ- 
ent. The error picture is obtained simply by superimposing the two 
patterns and making “exclusive or” of the corresponding pels. Figure 
10 is an example of matching two patterns of the same character, 
while Fig. 11 shows the matching of two unlike patterns. Two patterns 
are always matched nine times, allowing the displacement of one 
pattern compared to the other by +1 in both the horizontal and 
vertical directions. 


4.3 Matching decision 


The matching decision unit must process the error picture to detect 
whether there is a correct match, and to decide which relative position 
of the library pattern gives the best match. 

The straightforward approach is to count the number of errors (or 
1’s) in the error picture and to threshold it to make the decision. Such 


2522 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1983 





(a) (b) (c) 


Fig. 10—Template matching of two similar patterns, with (a) and (b) original patterns 
and (c) error picture. 





(a) (b) (c) 


Fig. 11—Template matching of two different patterns, with (a) and (b) original 
pattern and (c) error picture. 


a technique would lead to many mismatches or many undetected 
matches, since, as shown in Ref. 2, the error count for two patterns 
corresponding to the same character is sometimes higher than the 
count for two patterns corresponding to different characters. This is 
caused by the digitization noise. Figure 10 shows that the template 
matching of two patterns of the same character gives relatively ran- 
domly distributed errors. Figure 11 shows that in the case of patterns 
of different characters, a cluster of errors appears where there are 
morphological differences between patterns. 

As Ref. 2 shows, we could apply a weighted error count where the 
weight of an error is equal to the number of error pels among its eight 
neighbors. Single errors are erased and the maximum weight is eight. 
Figure 12 gives the weighted error pictures from the error pictures of 
Figs. 10 and 11. The weighted error count is not sufficient for the 
matching decision, as shown by Fig. 13. We must look at local error 
patterns to make the decision. The reason is that it is the local 
characteristics of the pattern that indicate whether two patterns are 
the same. Therefore, any decision made upon a count or integration 
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Fig. 12—Weighted error pictures. (a) Weighted error count is 18 in Fig. 10 and (b) 
144 in Fig. 11. 
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Fig. 13—A weighted error count matching criterion lead to a mismatch, with (a) and 
(b) original patterns, and (c) weighted error picture. 


may be incorrect. The matching decision described below uses only 
local measures and is also made locally with the simple rule that the 
match is considered correct if no local rejections are detected during a 
template matching. 

The following rule of decision is made. A match is rejected if: 

Condition 1: An error pel has a weight of 4 or more, or 

Condition 2: (a) an error pel has a weight of 2 or more, (b) at least 
two of its neighboring error pels are not connected, and (c) one of the 
two pels from the patterns used to obtain the error pel has a weight 
of 0 or 8 (corresponding to 0 or 8 surrounding black pels). 

Most mismatches are detected by Condition 1, but Condition 2 is 
necessary in order to reject, for example, the possible match of an e 
and ac shown in Fig. 13. It is easy to see that Condition 2a is not 
necessary since it is included in 2b, but Condition 2a reduces the 
computation. 

With these matching criterion, no visible mismatches have been 
detected, except slight distortion in line drawings. It is important to 
notice that a rejection can often be detected after processing a small - 
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fraction of the error picture. A matching decision made at the same 
time as the template matching would lead to an early abortion of 
template matchings and thus reduce the computation. 

When a correct match is detected, several relative positions some- 
times give a correct match. The chosen relative position will be the 
one with the lowest error count. The best relative position will decide 
where the library pattern will be put to replace the incoming pattern. 


Vv. CODING 


Contrary to many conventional facsimile coding techniques, we 
must code several different kinds of events and design several separate 
code books. The code for a pattern includes the position and the 
description of the pattern. The description is usually its library iden- 
tification, or in the case of a new pattern, its complete description. 
The coding procedure is described here for the size of the International 
Telegraph and Telephone Consulative Committee (CCITT) test fac- 
similes having 1728 pels per line and 2376 lines, but it can easily be 
modified for other cases. 


5.1 Coding of the position of the pattern 


To obtain a good-quality reproduction with pattern matching, we 
must position the patterns accurately. Considering the CCITT test 
documents, 23 bits are necessary for an absolute fixed length coding 
(11 bits horizontally, 12 bits vertically). We choose to transmit the 
horizontal position uncoded (11 bits) because variable-length run- 
length coding would lead only to slightly smaller coding length (typi- 
cally 1 to 1.5 less bits/pattern) since the horizontal distance between 
patterns is large. Also since the absolute horizontal position is coded, 
the patterns can be transmitted in a nonsequential order, which, as 
shown later, leads to a significant decrease in the average coding 
length for the library identification code words. It should be noted 
that with 1728 pels/line and an 11-bit code word, the code words 
starting with 111 are not used and therefore can be used as special 
code words. 

We code the vertical position of the patterns in the following way: 

1. A mode bit is sent at the beginning of each line to indicate 
whether there are any patterns starting on that line. 

2. If there are no patterns on the line, operation 1 is repeated on 
next line. 

3. If there are patterns on a line, they are all coded. The special 
horizontal code word 111 indicates that there are no more patterns on 
the line and that the next line can be considered. 

4. When a pattern is replaced by a library pattern, the position of 
the library pattern might be moved up or down by one line. Therefore, 
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after the library identification has been coded, the code words 10 and 
11 are used to position the library pattern up or down, while code word 
0 is used to indicate no vertical displacement. No vertical displacement 
code word is sent with a new library pattern, since there are no changes 
in vertical position. 

Figure 14 shows examples of the message format for the pattern 
positioning. 


5.2 Coding of the pattern identification 


The coder must send a pattern identification word with each pattern. 
We can transmit the pattern number uncoded. It requires, for example, 
seven bits in the case of a library size of 128 and nine bits in the case 
of a library size of 512. The coding procedure used here will lead to an 
average coding length of the pattern identification of fewer than five 
bits/pattern. It will be obtained by a continuous library updating and 
by variable-length coding. 


5.2.1 Library updating and management 


The library management and updating is done for the following 
purposes: 

1. Accept new library patterns, and if necessary, delete a seldom 
used library pattern to make room for the new one. 

2. Organize the library for the fastest match, taking into account 
the screening and matching procedures. 

3. Organize the library for minimum average library identification 
coding length. 

All three require the same processing: to keep track of the number 
of times each library pattern is used. By ordering the library pattern 
in order of decreasing usage, the correct match will be obtained rapidly, 
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Fig. 14—Coding of positions of patterns. Two lines have no patterns, then a line has 
three patterns; the first on position 231 is replaced by a library pattern, the second on 
position 1532 is a new library pattern, There are no patterns on next line. 
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since the most used library patterns will be accessed first. An efficient 
coding of the patterns’ identification is obtained by giving short code 
words to the first patterns in the list. The last pattern in the list, 
which is one of the least used patterns, can be deleted to make room 
for a new one. 

The updating must be deterministic and use no future information, 
since the receiver must make the same updating to decode correctly. 

The updating rule of the patterns in the library is as follows: 

1. When a pattern matches a library pattern number K, that library 
pattern is moved to number K/2 and all the pattern numbers from K/ 
2 to K-1 are increased by 1. 

2. When a new pattern is added to the library, it gets number N/2 
where N is the total number of library patterns. The patterns with 
numbers from N/2 to N will be increased by 1, and if N is equal to 
the maximum number of library patterns M, the library pattern with 
number N + 1 is dropped. 

This updating procedure was found to efficiently give low identifi- 
cation numbers to often used patterns and high numbers to seldom 
used patterns. If M is the maximum number of library patterns, it 
guarantees that a new library pattern will stay in library for at least 
M/2 matches, but generally for many more. 


5.2.2 Pattern identification coding table 


The pattern identification coding table includes two special code 
words: “new pattern” and “same pattern.” They are added to increase 
the coding efficiency. The “new pattern” code word is chosen because 
it is not necessary for a new library pattern to send an identification 
number, since the decoder uses the same rule as the coder to assign 
the identification number to the new pattern. The “same pattern” 
code word indicates that the transmitted pattern is the same as the 
previously transmitted pattern. It is useful particularly for typewritten 
text where the line-by-line search for a pattern often detects the same 
pattern (character). 

The coding table for the pattern identification is given in Table I 
for a pattern library with a maximum of 512 patterns. 

This code leads to an average library identification length of fewer 
than seven, compared to nine with a fixed-length code. The next 
section shows a more efficient coding procedure. 


5.2.3 Pattern identification coding by sorting 


Since an absolute code gives the horizontal position of a pattern, it 
is possible to transmit the patterns detected along a line in any order. 
The only condition is that the library updating be done at the end of 
the line. The average coding length of the library identification is 
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Table |—Coding table for identification of library patterns 


Code Word 
Symbol Code Word Length 
Same pattern 000 3 
Library pattern 1-16 1XXXX 5 
New pattern 00100 5 
Library pattern 17-32 010XXXX 7 
Library pattern 33-64 0011 XXXXX 9 
Library pattern 65-128 00101XXXXXX 11 
Library pattern 129-512 011XXXXXXXXX 12 


reduced to fewer than five bits, by sorting the patterns on a line 
according to their library number. That is because: 

1. Many of the patterns are the same. 

2. The library pattern identification number is run-length coded 
(only the increase compared to the previous identification number is 
coded). 

3. The new library patterns are sent at the end of the line; therefore, 
the new pattern code word is sent only once, since any more patterns 
are automatically new patterns. 

This can be illustrated by an example. Let a line have the following 
pattern: pattern 23, new pattern; pattern 28, same; pattern 23, new 
pattern. By looking at Table I, the coding length is 7+5+7+3+4+7 
+ 5 = 34 bits. With sorting, the patterns become: pattern 23, same; 
pattern 28, same; new pattern; new pattern. The coding length is 7 + 
3+5+3+5+ 0 = 23 bits. It should be noted in this example that 
pattern 28 is coded as pattern 5 since only the increase in identification 
number compared to the previous pattern is coded. 

The library updating is done at the end of each line. This creates 
problems when accepting new library patterns. They must be added 
immediately to the top of the library, since the position of the other 
patterns should not be changed. It is also not possible to delete patterns 
to make room for the new ones. For that reason, before scanning a 
line, enough library patterns should be deleted to avoid an overflow of 
the pattern library. 


5.3 Coding of the library pattern description 


The size of a pattern is limited to 32 < 32 bits. The description 
starts with a 5-bit word, which indicates the height, H, of a pattern in 
binary. The length of a pattern is extended to 32 pels by filling the 
right end with 0’s. Therefore, there are 32 X H pels to code. For coding 
efficiency, one white pel (0) is added at the beginning. A coding line 
is made of the 32 X H + 1 pels considered in the raster scan order. 
The reference line is similar to the coded line except that all the pels 
are shifted to the right by 32 pels (one line). Therefore, a line is coded 
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using the previous line as the reference. The line is then coded by the 
CCITT two-dimensional code,® with the only modification that the 
first code word, which is always the horizontal mode code word, is 
deleted, since it doesn’t give any information. For coding efficiency, it 
is chosen to allow switching between two modes for the coding of the 
library pattern description. The first mode is as described above and 
called “horizontal coding.” The other is called “vertical coding” and is 
the same as above except that the pattern is coded column after 
column from top to bottom. Therefore, in the vertical mode the 
description starts with a 5-bit word indicating the length of a pattern. 
A header bit indicates which mode is chosen, with a 0 for horizontal 
mode and a 1 for vertical mode. We could also code the pattern 
description using a code better matched to the source. This would 
reduce the coding length, but at the expense of requiring a specific 
code in place of a standard code. 


5.4 Coding summary 


The coding procedure can be summarized in the following way: 

1. All the patterns isolated along a scan line are matched. 

2. At the end of the line, the matched patterns are sorted in order 
of increasing pattern identification number. The new library patterns 
are added at the end in sequential order. 

3. The patterns are coded and transmitted with the information 
sent in the following order: 

a. Horizontal position of pattern. 

b. Pattern identification. If it is a new pattern, the identification 
is sent only for the first new pattern on the line. 

c. A 1- or 2-bit code word to specify the vertical shift of a pattern, 
except if it is a new library pattern. 

d. For a new library pattern the following bits are sent: (1) a header 
bit indicating whether the horizontal or vertical coding mode is 
chosen, (2) a 5-bit word indicating the number of lines of the 
pattern to be coded, and (3) the CCITT two-dimensional coding 
of the pattern (see 5.2). 

e. After all patterns on a line have been sent, the special horizontal 
code word 111 indicates the end of the line. 

f. The library update is made according to 5.2.3. The patterns are 
updated in order to increasing identification number. After 
updating, all patterns with a number greater than 480 are 
deleted, thus allowing for at least 32 new library patterns to be 
added on the next line. 

Figure 15 is an example of message transmission. The different code 
words are summarized in Table II. 
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Fig. 15—Message transmission. There are two lines without patterns, next pattern 23 is in position 936, same pattern is in position 1486, pattern 
28 is in position 416, same pattern is in position 1231; two new patterns are in position 249 and 998, and there are no patterns on next line. 


Table Il—Description of the code words for pattern matching coding 


Code Definition Word Size Description 
Mode bit 1 Indicates whether there are any patterns on 
the line. 
Horizontal position 11 Gives in binary the absolute position of a 
pattern. 
No more pattern 3 Indicates that there are no more patterns 


on the line (this code word: 111 is a special 
horizontal position code word). 

Vertical move of pattern lor 2 Indicates whether the pattern must be 
moved up or down by one line or is not 


moved. 
Library identification code Variable = Defines which library pattern is transmit- 


ted. 
Library pattern descrip- 3 Indicates whether the library pattern is 
tion header coded in horizontal or vertical mode. 
Library pattern size 5 
Library pattern descrip- Variable = Slightly modified CCITT two-dimensional 
tion code. 


VI. SIMULATION RESULTS 


The important criteria are the compression and the quality of the 
received documents. For that purpose, the set of eight CCITT facsimile 
documents are used. Their resolution is 7.7 pels/mm (200 pels/in.) in 
both the horizontal and vertical directions. They have 1728 pels/line 
and 2876 lines. Documents one, two, four and five are shown in Fig. 
16. All eight documents are shown in Ref. 5. For accurate comparison 
with the matching technique by Pratt et al.,? the simulations were also 
made with an older nonofficial version of the CCITT documents, 
which is similar except each document has 1728 pels/line and 2128 
lines. 


6.1 Facsimile quality 


In order to improve the quality of the decoded picture, a local 
filtering using a 3 X 3 window is applied. In addition, large library 
patterns are slightly expanded on their borders. This operation erases 
artifacts in large black regions. 

The encoding scheme modifies the binary picture. We must there- 
fore verify that the alterations are not visible or at least not annoying. 
We can consider three picture alterations: wrong matches, matches 
with a slightly distorted pattern, and wrong positioning. In the case of 
a wrong match, a pattern is replaced by a different pattern. The only 
detected wrong matches are such as between 0 and O, dot and comma, 
I and 1, which even people cannot recognize correctly without using 
the context. Therefore, it can be considered that the system has 
practically no wrong matches. A match with a slightly distorted pattern 
can occur with characters. A character might match a same character 
of a different font. Or a character might match a same but thinned or 
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THE SLEREXE COMPANY LIMITED 


SAPORS LANE - BOOLE - DORSET - BH25 8ER 
TELEPHONE BOOLE (94513) 51617 - rRrEX 123456 


Our Ref. 350/PJC/EAC 18th January, 1972. 


Dr. P.N. Cundall, 
Mining Surveys Ltd., 
Holroyd Road, 
Reading, 

Berks. 


Dear Pete, 


Permit me to introduce you to the facility of facsimile 
transmission. 


In facsimile a photocell is caused to perform a raster scan over 
the subject copy. The variations of print density on the document 
cause the photocell to generate an analogous electrical video signal. 
This signal is used to modulate a carrier, which is transmitted to a 
remote destination over a radio or cable communications link. 


At the remote terminal, demodulation recoustructs the video 
signal, which is used to modulate the density of print produced by a 
printing device. This device is scanning in a raster scan synchronised 
with that at the transmitting terminal. As a result, a facsimile 
copy of the subject document is produced. 

Probably you have uses for this facility in your organisation. 


Yours sincerely, 


THA. 
P.J. CROSS 
Group Leader - Facsimile Research 


(a) 
Fig. 16(a)—Original CCITT document one (first 2000 lines). 





thickened character. Such matches, contrary to wrong matches, are 
tolerable if they don’t appear too often. Such distorted matches appear 
when two slightly different fonts are used on a same page or when 
characters of a page come from a low-quality typewriter or scanner. 
The wrong positioning of a pattern decreases the quality of the received 
facsimile. No noticeable wrong positioning for patterns such as char- 
acters or other symbols is observed. Some visible wrong positionings 
are observed for nonsymbol patterns such as line segments, where the 
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(b) 
Fig. 16(b)—Original CCITT document two (first 2000 lines). 


successive patterns make the lines slightly jagged. Figure 17 shows the 
same CCITT facsimiles as Fig. 16, but after transmission by pattern 
matching. It can be seen that there are no significant degradations. 
There are some slight irregularities in line drawings, as for example 
in Fig. 17d. A few distorted matches appear on CCITT document one 
(Fig. 17a). 


6.2 Compression 


To make an accurate comparison with both the symbol matching 
and two-dimensional coding techniques, the coding simulations have 
been made with both the official set of CCITT facsimile documents 
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L'ordre de lancement et de réalisation des applications fait l'objet de décisions au plus haut 
niveau de la Direction Générale des Télécomrmounications. 0 n'est certes pas question de 
construire ce systé¢me intégré “en bloc" mais bien au contraire de procéder par étapes, par 
paliers successifa, Certaines applications, dont la rentabilité ne pourra @tre assurée, re 
seront pas entreprises, Actuellement, sur trente applications qui ont pu étre globalement 
définies, sixen sont au stade de l'exploitation, six autres se sont vu donner la priorité pour 
leur réalisation, 

Chaque application est confiée a un "chef de projet", responsuble successivement de sa 
conception, de son analyse-prograramation et de sa mise en oeuvre dans une région-pilote. 
La généralisation ultérieure de l'application réalisée dans cette région-pilote dépend des 
résultats obtenus et fait l'objet d'unc décision de la Direction Générale, Néanmoins, le 
chef de projet doit dés le départ considérer que son activité a une vocation nationale donc 
refuser tout particularisme régional, I est aidé d'une équipe d'analystes-.programmeurs 
et entouré d'un "groupe de conception” chargé de rédiger le document de "définition des 
objectifs globaux" puis le "cahier des charges" de l'application, qui sont adressés pour avis 
A tous les services utilisateurs potentiels et aux chefs de projet des autres applications. 
Le groupe de conception comprend 6 4 10 personnes représentact les services les plus 
divers concernés par le projet,et comporte obligatoirement un bon analyste attaché a l'ap- 
plication. 


Il - L'IMPLANTATION GEOGRAPHIQUE D'UN RESEAU INFORMATIQUE PERFORMANT 


L'organisation de l'entreprise francaise des télécommunications repose sur l'existence de 
20 régions, Des calculateurs ont été impJantés dans le passé au moins dans toutes les plus 
importantes, Ontrouve ainsi des machines Bull Gamma 30 4 Lyon et Marseille, des GE 425 
A Lille, Bordeaux, Toulouse et Montpellier, un GE 437 4 Massy, enfin quelques machines 
Bull 300 TI A programmes c&blés étaicnt récemment ou sont encore en service dans les 
régionsde Nancy, Nantes, Limoges, Poitiers et Rouen ; ce pare est cssentiellement utilisé 
pour la comptabilité téléphonique, 

Al'avenir, sila plupart des fichiers nécessajres aux applications décrites plus haut peuvent 
étre gérésentemps différé, un certain nombre d'entre eux devront nécessairement étre ac- 
cessibles, voire mis a jour en temps réel : parmi ces derniere le fichier commercial des 
abonnés, le fichier des renseignements, le fichier des circuits, le fichier technique des 
abonnés contiendront des quantités considérables d'informations, 

Le volume total de caractéres 4 gérer en phase finale sur un ordinateur ayant en charge 
quelques 500 000 abonnés a été estimé a un milliard de caractéres au moins, Au moins le 
tiers des données seront concernées par des traitements en temps réel, 

Aucun des calculateurs énumérés plus haut ne permettait d'envisager de tels traitements, 
Liintégration progressive de toutes les applications suppose la création d'un support commun 
pour toutes les informations, une véritable "Banque de données", répartie sur des moyens 
de traitement nationaux et régionaux, et qui devra rester alimentée, mise a4 jour en perma- 
nence, & partir de la base de ]'entreprise, c'est-a-dire les chantiers, les magasins, les 
guichets des services d'abonnement, les services de personnel etc, 

L'étude des différents fichiers 4 constituer a donc permis de définir les principales carac- 
téristiques du réseau d'ordinateurs nouveaux 4 mettre en place pour aborder la réalisation 
du systéme informatif, L'obligation de faire appel 4 des ordinateurs de troisiéme génération, 
trés puissants et dotés de volurnineuses mémoires de masse, a conduit 4 en réduire substan- 
tiellement le nombre, 

Li'implantation de sept centres de calcul interrégionaux constituera un compromis entre : 
dtune partle désir de réduire le coft économique de ensemble, de faciliter la coordination 
des équipes d'informaticiens; et d'autre part le refus de créer des centres trop importants 
difficiles a gérer et 4 diriger,et posant des problémes délicats de sécurité, Le regroupe- 
ment des traitements relatifs & plusieurs régions sur chacun de ces sept centres permettra 
ce leur donner une taille relativement homogéne. Chaque centre "gerera" environ un mil- 





(c) 
Fig. 16(c)—Original CCITT document four (first 2000 lines). 


and a former nonofficial version often used for facsimile compression 
comparisons. Table III gives the coding lengths for the CCITT docu- 
ments for the official and nonofficial set of CCITT documents, re- 
spectively. They include the code length for the different codes nec- 
essary for the pattern matching coding. Table IV gives the compression 
ratio for the same CCITT documents and compares them with the 
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Cela est d’autant plus valable que TAS est plus 
grand. A cet égard Ia figure 2 représenie Ia vraic courte 
doonant [¢(/){ en fonction de f pour fes valeurs numé- 
riques indiquées page précédente. 


Dans ce cas, le filtre adapté pourra étre constitué, 
coaformément & la figure 3, par la cxscade : 


~~ d’un filtre passo-bande de transfert unité pour 
Se <S€fot+Af et de transfert quasi nul pour 
S<fettf > fotAf, filtre no modifiant pas ta phase 
des composants le traversant ; 


telle ligne & retard est donnée par : 


f 
o=-2| T, df 
o 


p= —2x [nett leer 


Et cette phase cst bien l'opposé de /P(/), 
& un déphasage constant prés (sans importance) 
et & un retard Ty prés (inévitable). 


Un signal utile S(¢) traversant un tel filtre adapté 
donne a la sortie (A un retard 7 prés et & un dépha- 
sage prés de la porteuse) un signal dont la transformée 
de Fourier est réelle, constante entre fo et fot+Af, 
et nulle de part et d’autre de fy et de fo+Ay, c’est- 
a-dire un signal de fréquence portcuse fg+A//2 et 
dont lenveloppe a la forme indiquée & lta figure 5, 
ot Pon a représenté simultanément le signal S(t). 
et le signal S,(¢) correspundant obtenu a la sortie 
du filtre adapté. On comprend le nom de récepteur 
& compression d’impulsion donné & ce genre de 
filtre adapté : la « largeur » (@ 3 dB) du signal com- 
primé étant égale a 1/Af, Je rapport de compression 


est de). = TAS 
1/af 


(et) 


— filure suivi d'une ligne & retard (LAR) disper- 
sive ayant un temps de propagation de groupe T, 
dérroissant linéairerent avec la fréquence f suivant ara aac aaa ees 
Vexpression : wpe eg ee 


Evvectgpe de S(t} 


Te © thy 
fe S MHz 
vateys 


th" Te-D (avec Ts > T) 


(voir fig. 4), 


On saisit physiquernent Je phénoméne de com- 
pression en réalisant que lorsque le signal S(¢) entre 
dans la ligne 4 retard (LAR) Ia fréquence qui entre 
la premi¢re A Tinstant 0 est Ja fréquence basse fo, 





(d) 
Fig. 16(d)— Original CCITT document five (first 2000 lines). 


symbol-matching technique of Pratt et al.” and the two-dimensional 
CCITT code. The results are without any synchronization or stuffing 
bits, which is natural since pattern matching coding would be intended 
for future facsimile networks such as group four facsimile machines 
with fewer overhead bits. Therefore, the compressions of the two- 
dimensional CCITT code and symbol matching have been corrected 
by deleting the synchronization and stuffing bits and are different 
from their values given in Refs. 2 and 5. 
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THE SLEREXE COMPANY LIMITED 


SAPORS LANE - BOOLE - DORSET - BH 25 8ER 
TELEPHONE BOOLE (945 13) 51617 - TELEX 123456 


Our Ref. 350/PJC/EAC 18th January, 1972. 


Dr. P.N. Cundall, 
Mining Surveys Ltd., 
Holroyd Road, 
Reading, 

Berks. 


Dear Pete, 


Permit me to introduce you to the facility of facsimile 
transmission. 


In facsimile a photocell is caused to perform a raster scan over 
the subject copy. The variations of print density on the document 
cause the photocell to generate an analogous electrical video signal. 
This signal is used to modulate a carrier, which is transmitted to a 
remote destination over a radio or cable communications link. 


At the remote terminal, demodulation reconstructs the video 
signal, which is used to modulate the density of print produced by a 
printing device. This device is scanning in a raster scan synchronised 
with that at the transmitting terminal, As a result, a facsimile 
copy of the subject document is produced. 

Probably you have uses for this facility in your organisation. 


Yours sincerely, 


ThA. 
P.J. CROSS 
Group Leader - Facsimile Research 


(a) 
Fig. 17(a)—Document on (first 2000 lines) after pattern matching (first 2000 lines). 





Very high compressions are obtained—up to 80. The compression 
has often doubled compared to that of the two-dimensional CCITT 
code and is sometimes 4.8 times higher. The compression is, depending 
upon the documents, 20 to 80 percent higher than the compression 
derived from the symbol matching technique by Pratt et al.? More 
detailed comparisons and observations are useful when considering 
the performances of facsimile coding by pattern matching: 

1. An astonishing fact is the difference in compression observed 
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Fig. 17(b)—Document two (first 2000 lines) after pattern matching. 


between the old and the official version of the CCITT documents. For 
documents three and five the compressions are nearly twice as high 
for the old version than for the official version. Significant discrep- 
ancies are also observed for documents one and eight. This is in spite 
of the fact that old and official documents are the same except that 
they were scanned differently. It can also be noted that for the two- 
dimensional CCITT code, the difference in compression is smaller 
than five percent except for document eight, where the difference is 
about 20 percent. It must therefore be concluded that the performances 
of the pattern matching coding techniques are much more dependent 
upon the scanning and binary thresholding. Observing both versions 
of documents three and five, the main difference is that in the official 
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- 34 - 


L'ordre de lancement et de réalisation des applications fait l'objet de décisions au plus haut 
niveau de la Direction Générale des Télécommunications. D1 n'est certes pas question de 
construire ce syst@¢me intégré "en bloc" mais bien au contrafre de procéder par étapes, par 
paliers successifs. Certaines applications, dont la rentabilité ne pourra @tre assurée, re 
seront pas entreprises. Actuellement, sur trente applications qui ont pu étre globalement 
définies, sixen sont au stade de l'exploitation, six autres se sont vu donner la priorité pour 
leur réalisation, 

Chaque application est confiée a un "chef de projet", responsable successivement de sa 
conception, de son analyse-programmation et de sa mise en oeuvre dans une région-pilote. 
La généralisation ultérieure de l'application réalisée dans cette région-pilote dépend des 
résultats obtenus et fait l'objet d'une décision de la Direction Générale. Néanmoins, le 
chef de projet doit dés le départ considérer que son activité a une vocation nationale donc 
refuser tout partlcularisme régional. D1 est aidé d'une équipe d'analystes-programmeurs 
et entouré d'un "groupe de conception" chargé de rédiger le document de "définition des 
objectifs globaux" puis le "cahier des charges" de l'application, qui sont adressés pour avis 
A tous les services utilisateurs potentiels et aux chefs de projet des autres applications. 
Le groupe de conception comprend 6 a 10 personnes représentant les services les plus 
divers concernés par le projet,et comporte obligatoirement un bon analyste attaché a l'ap- 
plication. 


IM - L'IMPLANTATION GEOGRAPHIQUE D'UN RESEAU INFORMATIQUE PERFORMANT 


L‘organisation de l'entreprise frangaise des télécommunications repose sur l'existence de 
20 régions. Des calculateurs ont été implantés dans le passé au moins dans toutes les plus 
importantes. On trouve ainsi des machines Bull Gamma 30 a Lyon et Marseille, des GE 425 
A Lille, Bordeaux, Toulouse et Montpellier, un GE 437 4 Massy, enfin quelques machines 
Bull 300 TI & programmes c&blés étaient récemment ou sont encore en service dans les 
régions de Nancy, Nantes, Limoges, Poitiers et Rouen ; ce parc est essentiellement utilisé 
pour la comptabilité téléphonique. 

Al'avenir, sila plupart des fichiers nécessaires aux applications décrites plus haut peuvent 
etre gérésentemps différé, un certain nombre d'entre eux devront nécessairement étre ac- 
cessibles, voire mis a jour en temps réel : parmi ces derniers le fichier commercial des 
abonnés, le fichier des renseignements, le fichier des circuits, le fichier technique des 
abonnés contiendront des quantités considérables d'informations. 

Le volume total de caractéres a gérer en phase finale sur un ordinateur ayant en charge 
quelques 500 000 abonnés a été estimé a un milliard de caractéres au moins, Au moins le 
tlers des données seront concernées par des traitements en temps réel. 

Aucun des calculateurs énumérés plus haut ne permettait d'envisager de tels traitements. 
L'intégration progressive de toutes les applications suppose la création d'un support commun 
pour toutes les informations, une véritable "Banque de données", répartie sur des moyens 
de traitement nationaux et régionaux, et qui devra rester alimentée, mise 4 jour en perma- 
nence, a partir de la base de l'entreprise, c'est-a-dire les chantiers, les magasins, les 
guichets des services d'abonnement, Jes services de personnel etc. 

L'étude des différents fichiers 4 constituer a done permis de définir les principales carac- 
téristiques du réseau d'ordinateurs nouveaux 4 mettre en place pour aborder la réalisation 
du systéme informatif, L'obligation de faire appel 4 des ordinateurs de troisiime génération, 
trés puissants et dotés de volumineuses mémoires de masse, a conduit A en réduire substan- 
tiellement le nombre. 

L'implantation de sept centres de calcul interrégionaux constituera un compromis entre : 
d'une partle désir de réduire le coOt économique de l'ensemble. de faciliter la coordination 
des équipes d'informaticiens; et d'autre part le refus de créer des centres trop importants 
difficiles 4 gérer et A diriger,et posant des problémes délicats de sécurité. Le regroupe- 
ment des traitements relatifs a plusieurs régions sur chacun de ces sept centres permettra 
de leur donner une taille relativement homogéne. Chaque centre "gtrera"™ environ un mil- 





(c) 
Fig. 17(c)—Document four (first 2000 lines) after pattern matching. 


version, characters are often clustered together, which leads to incor- 
rectly (or rather “nonconveniently”) isolated characters (as shown in 
Fig. 18a), while for the old version, the characters are rarely clustered 
together. However, sometimes a character in the old version is isolated 
into several patterns because not all its pels are connected (as shown 
in Fig. 18b). The old version of documents three and five should have 
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Cela est d’autant plus valable que 7 Af est plus 
grand. A cet égard'la figure 2 représente la vraic courbe 
donnant | ¢(/)| en fonction de f pour les valeurs numé- 

i indiquées page précédente. 


Denes cen Tine Sn ee 
conformément .& la figere 3, per la cascade : 

— d’un filire passe-bande de transfert unité pour 
fo Sf <fot hf a de transfert quasi nul pour 
S<fotf>fotd/, filtre ne modifiant pas la phase 
des composants fe traversant ; 


telle ligne A retard est donnée par : 


f 
ox -2x| T, df 
i) 


ou —2x [+t S+x aS 


Et cette phase est bien l’opposé de IS), 

& un déphasage constant prés (sans importance) 

et a un retard Ty pres (inévitable). 

Un signal utile S(t) traversant un tel filtre adapte 
donne & la sortie (A un retard Ty prés et A un dépha- 
sage prés de la porteuse) un signal dont la transformée 
de Fourier est réelle, constante entre fg et fot A/, 
et nulle de part et d’autre de fy et de fotAf, c’est- 
a-dire un signal de fréquence porteuse fy+A//2 et 
dont lenveloppe a la forme indiquée a la figure 5, 
ob Ion a représenté simultanément le signal S(?). 
et le signal S,(t) correspondant obtenu & la sortic 
du filtre adapté. On comprend ke nom de récepteur 
& compression d’impulsion donné a ce genre de 
filtre adapté : la « largeur » (4 3 dB) du signal com- 
primé étant égale A 1/Af, le rapport de compression 


— fittre suivi d’une ligne & retard (LAR) disper- 
sive syast un temps de propagation de groupe 7, 
décroissant linéairement avec la fréquence / suivant 
\"expression : 


™= Teo Dx (avec Ty > T) 


(voir fig. 4), 


On saisit physiquement le phénoméne de com- 
pression en réalisant cue lorsque le signal S(t) entre 
dans la ligne & retard (LAR) la fréquence qui entre 
la premitre & V’instart 0 est Ja fréquence basse fy, 





(d) 
Fig. 17(d)—-Document five (first 2000 lines) after pattern matching. 


more patterns than the official but fewer library patterns. In fact, old 
CCITT document three has 2199 patterns and 225 library patterns, 
while the official version has 1945 patterns and 551 library patterns. 
The coding length of a library pattern is much greater (by a factor of 
about 10) than that of a nonlibrary pattern, which explains the 
difference in compression ratios. It can be concluded that pattern 
matching is much more dependent on the scanning quality and the 
thresholding than two-dimensional facsimile codes. 
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Table III—Coding length in bits with pattern matching coding 
(a) CCITT documents (2376 lines, 1728 pels/line) 


Mode bit 

Horizontal position 

No more pattern 

Vertical move of pattern 
Library identification code 
Library pattern description 


Total 


(b) Older CCITT documents (same as documents used in Ref. 1) (2128 lines, 1728 pels/line) 


Mode bit 

Horizontal position 

No more pattern 

Vertical move of pattern 
Library identification code 
Library pattern description 


Total 


CCITT1 CCITT2 CCITT3  CCITT4 
2376 2376 2376 2376 
11847 8096 21395 46464 
888 1587 2193 1854 
1114 411 1921 5724 
4895 3749 9458 18584 
43983 71427 120749 41138 
65103 87646 158092 116140 
CCITT1 CCITT2 CCITT3 CCITT4 
2128 2128 2128 2128 
12342 7733 24189 49610 
846 1533 2103 2574 
1155 463 2448 5726 
4958 3099 11078 21385 
23254 62825 29368 26700 
44683 78281 71314 108123 


CCITT5 


2376 
23551 
2235 
2242 
10848 
94923 


136175 


CCITT5 


2128 
27511 
2091 
3028 
11940 
22441 


69139 


CCITT6 


2376 
15356 
1929 
1347 
6426 
71959 


99393 


CCITT6 


2128 
16049 
2007 
1411 
6927 
55077 


83599 


CCITT7 


2376 
36828 
3165 
3427 
19289 
204113 


269198 


CCITT7 


2128 
38566 
3942 
3812 
21842 
155681 


225971 


CCITT8 


2376 
31240 
3231 
3642 
13085 
115415 


168989 


CCITT8 


2128 
30635 
3168 
2441 
12773 
122986 


174131 


Table !¥V—Comparison of compression ratios 
(a) Official CCITT documents (2376 x 1728 pels) 


Increase Versus Two- 
Two-Dimensional Dimensional CCITT 


Picture Pattern Matching CCITT Code Code 
CCITT1 63.1 28.3 122% 
CCITT2 46.8 47.5 -1% 
CCITTS3 26.0 17.9 46% 
CCITT4 35.4 7.4 378% 
CCITTS5 30.2 15.9 90% 
CCITT6 41.3 30.8 34% 
CCITT7 15.3 71.4 106% 
CCITT8 24.3 26.9 —8% 


(b) Nonofficial CCITT document (2128 x 1728 pels) 


Increase Ver- 
Two-Dimen- Increase Ver- sus Two-Di- 


Pattern Symbol sional CCITT susSymbol mensional 
Picture Matching Matching Code Matching CCITT Code 
CCITT1 82.3 63.1 28.6 30% 188% 
CCITT2 47.0 38.1 45.4 23% 3% 
CCITT3 51.6 32.4 18.5 59% 179% 
CCITT4 34.0 25.6 7.5 33% 353% 
CCITT5 53.2 33.6 16.5 58% 222% 
CCITT6 44.0 29.5 30.2 49% 46% 
CCITT7 16.3 9.0 7.1 81% 130% 
CCITT8 21.1 17.8 22.1 19% —5% 


mm trop im 


sus G Le re 
ubub ntres pe 


Fig. 18—Characters isolated in an unwanted way. (a) Characters clustered together. 
(b) Text containing characters isolated into several symbols. 


2. The increase in compression ratio compared to the two-dimen- 
sional run-length code is quite variable. For documents containing 
mostly handwritten drawings and text, such as documents two and 
eight, there is sometimes a slight decrease in the compression ratio. 
That is because there are few matching patterns. For example, for 
document two, there are 736 patterns, but 448 of them are library 
patterns. For documents containing mostly text, such as document 
four, the compression ratio increases by a factor of about 4.5. For 
documents containing a mixture of text and drawings, the increase 
varies between 35 and 220 percent, depending on the content and 
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thresholding. It should be noted that for document seven, which 
contains printed ideograms, the increase in compression ratio is 
smaller than for regular printed text because there are more ideograms 
than letters, but the compression ratio still doubled. 

3. The increase in compression ratio by pattern matching is 20 to 
80 percent compared to the symbol matching of Ref. 2. The increase 
has been obtained by a combination of several factors. The most 
important are (a) isolation of nonsymbols (lead to significant improve- 
ment for documents three, four, five, and six, but has a slight negative 
effect on documents two and eight), (b) better matching, leading to 
fewer library patterns, and (c) improved coding efficiency obtained by 
sorting the patterns and by other coding modifications. 

By looking at the coding length necessary for the different kinds of 
code words, in Table III, it is clear that the predominant part of the 
code is used for the description of the library patterns, accounting 
generally for more than 60 percent of the total coding length. There- 
fore, improving it can bring the highest reward. The improvement can 
be obtained by reducing the number of library patterns or by coding 
the pattern description more efficiently. The next most bit-consuming 
part is the coding of the horizontal position; it uses about 20 percent 
of the total coding length. 


6.3 Complexity 


The pattern-matching coding has the disadvantage of being complex 
and time-consuming—the price to pay for an efficient coding. The 
most time-consuming parts are: the isolation, the template matching, 
and the matching decision. The isolation is both complex and time- 
consuming, and therefore the most difficult part, but by using fast 
logic, it is possible to isolate all the patterns in about one second. The 
template matching is a simple operation, but it takes a long time. It is 
therefore less of a challenge, since it is easily done in parallel and with 
simple hardware. The most time-consuming part of the matching 
decision uses local operators on, for example, 3 X 3 windows and can 
therefore also be realized without much complication. Most of the 
high-level operations are much slower and can be done by micropro- 
cessors. This system should not be more complicated than in Ref. 2. 

An important factor is that the decoding is much easier and faster 
than the coding, since there is no isolation or matching. Such a 
technique is therefore particularly suited for transmission with one 
sender and several receivers. 

An experimental pattern matcher has been built to show that the 
same kind of compression can be obtained when scanning real docu- 
ments. By using a mixture of custom logic and programmed logic, 
transmission has reached speeds at rates up to 64 kb/s. A document 
is then usually sent in one to two seconds. 
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VII. EXTENSIONS 


Several improvements or different applications of a pattern matcher 
can be considered. Some of them will be described here. 


7.1 Multipage document and prestored libraries 


When transmitting several pages of a document, the library from 
one page can be used for next page, thus reducing the number of 
library patterns for each page. In such cases, compressions up to eight 
times higher than with conventional coding techniques can be 
achieved. If a few fonts are prestored in the coder and decoder, the 
compression can be increased significantly. 


7.2 Very high-quality transmission 

It is possible to use a tighter matching algorithm when even slight 
distortions are not tolerated. Such a mode can easily be implemented. 
It reduces the compression by an average of 15 percent. In that case, 
most of the postprocessing can be deleted. 


7.3 Standardization 


The CCITT is looking into standardizing facsimile coding tech- 
niques for future facsimile machines communicating over digital links 
(Group four facsimile apparatus). The modified READ code (also 
called two-dimensional CCITT code) has been standardized. The 
pattern matching coding technique has been proposed by AT&T to 
the CCITT as an optional coding technique yielding much higher 
compression. The only difference in the proposal compared to this 
paper is that no cross decomposition is applied. The compression is 
therefore slightly lower. 


7.4 High-resolution graphics 


Future scanners and coders will probably include resolutions higher 
than 200 pels/in. They will probably use 300 and 400 pels/in. The 
pattern matching technique can easily be modified for such resolution. 
The maximum size of the patterns should be increased to keep the 
coding efficient. In addition, the codes for the positioning of patterns 
must be slightly changed. The matching algorithm would stay un- 
changed. Compared to conventional techniques, the improvement in 
the compression will be as high and often even higher at such resolu- 
tions. 


Vill. CONCLUSION 


A system for coding of facsimiles using pattern matching has been 
described. It allows an important increase in the compression ratio 
compared with a symbol matching system” and gives a compression 
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ratio that is up to 4.8 times that of conventional facsimile coding 
techniques. The improvement is naturally greater for printed text than 
for handwritten text. It is felt that further significant inprovements 
are possible by better matching and coding. An important observation 
is that pattern matching coding is very dependent on the digitization 
and thresholding. Therefore, the combination of the thresholding and 
the isolation could lead to significant improvements in compressions. 
Another consequence is that if a bad quality scanner is used, the 
pattern matching will hardly lead to higher compressions than con- 
ventional facsimile codes. With modern electronics components, a 
pattern matcher can be realized by hardware and would lead to an 
important reduction in the transmission costs of high-volume facsim- 
iles. 
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Data-Transport Performance Analysis of Fasnet 


By D. P. HEYMAN* 
(Manuscript received February 25, 1983) 


This paper presents a queueing model to assess the performance of Fasnet, 
a recently invented local-area network. Fasnet is intended for high-speed lines 
capable of carrying a wide mix of traffic. We confine our attention to data 
traffic only. An approximate formula for the expected delay of a packet is 
obtained; the approximate formula compares favorably to simulations of 
Fasnet. 


I]. INTRODUCTION 


Fasnet is an implicit token-passing local-area network.’ It is in- 
tended for high-speed lines capable of carrying a wide mix of traffic 
(data, voice, video, and facsimile). In this paper, we present a queueing 
model to assess the performance of Fasnet with data traffic only. An 
approximation for the expected delay of a packet is obtained; the 
approximate solution compares favorably with measurements taken 
from a simulation of Fasnet. Our numerical results yield a mean delay 
that is less than 1 ms for a 1-kb packet when the line speed is 100 
Mb/s and the occupancy of the line is 0.9. 

Section II consists of a brief description of Fasnet. Section III 
describes our model and its approximate solution and Section IV 
presents comparisons with simulations. The effects of bursty traffic 
are given in Section V and our conclusions are stated in Section VI. 


Il. A BRIEF DESCRIPTION OF FASNET 


We will now give a description of Fasnet that will enable the reader 
to appreciate the model in Section III. A complete description is given 
in Ref. 1. 


* Bell Laboratories. 
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LINE B 


7 oP —e 


READ TAP DIRECTIONAL COUPLER TERMINATION 


Fig. 1—Physical configuration of a Fasnet link. 


The basic link as shown in Fig. 1 consists of two lines. One line 
carries traffic in one direction and the other line carries traffic in the 
reverse direction. For line A (which carries traffic from station 1 
towards station N), station 1 is called the head station and station 
N + 1 is called the end station. For line B, the roles are reversed. Each 
station makes two connections to each line. A read tap precedes a 
passive directional coupler used for writing. The signal read from the 
read tap will be unaffected by the signal being written simultaneously 
on the line via the directional coupler. Except for specific fields of the 
header, the protocol ensures that only one station at a time writes on 
the line. Thus, once a message is written on a line, it is not removed 
or changed by any station. 

The access control is similar for lines A and B. For line A, the head 
station (station 1) will initiate a cycle, which operates in the following 
way. One time each cycle, each station with packets destined toward 
the end station is allowed to access the line for a single time interval, 
during which at most Pmax packets can be sent. A station knows when 
to place its packets.on the line by reading a particular bit (called the 
busy bit) in the access control field. This bit is added to the message 
packet by the network layer of the protocol. Thus, in each cycle, 
station 1 has the first opportunity to send packets, station 2 has the 
second opportunity, and station N has the Nth and last* opportunity. 
Each station has exactly one opportunity per cycle to send packets. 
When station N + 1 receives a packet in which the busy bit indicates 
that the packet has not been used, it sends a message to station 1 
(using line B) to start a new cycle. There may be synchronization 
delays at each end of the transmission of this message. The operation 
on line B is identical to the operation of line A, with station N + 1 as 
the head station and all flows reversed accordingly. 


* We assume that station N + 1 will not send messages to itself. 
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Ill. THE MODEL 


Fasnet behaves as an implicit token-passing protocol because con- 
trol passes from station to station as if a token were sent from station 
1 to station 2, ---, to station N + 1 to station 1, and so forth. This 
suggests that a queueing model of queues served in cyclic order would 
be appropriate. In this type of model, there is a single server that visits 
N + 1 different queues in the cyclic order described above. For Fasnet, 
the server is conceptual: it is the opportunity to place packets on the 
line. The service time is the length of time to write a packet. The 
queues correspond to the buffers at each station. 

Several papers have been written about queues served in cyclic 
order. In most of these papers, it is assumed that there is exhaustive 
service at each station. This means that the server processes all 
customers waiting at the station at the epoch that the server reaches 
the station. The most general model of exhaustive service is in Eisen- 
berg,” where each queue is of the M/G/1 type and the times to travel 
between adjacent stations may depend on the pair of stations involved. 
The solution of this model is in terms of transforms that are not given 
in closed form; however, the equations can be solved numerically. A 
special case of the model in Ref. 2 is treated in Konheim and Meister.® 
Here, all service times are the constant A, and all travel times between 
adjacent stations have the same distribution, which is concentrated 
on A, 2A, ---. Konheim and Meister are able to obtain closed-form 
solutions for steady-state performance measures in this case. Their 
results will be used in our analysis. The only paper where service is 
not exhaustive is Ref. 4 by Eisenberg. That paper contains two M/M/1 
type queues, and the server can process at most one customer during 
a visit to a server. The solution to this model requires extensive 
calculations, and the restriction to two stations is unrealistic for 
Fasnet. 

We have chosen to seek approximate solutions where each station 
is of the M/D/1 type and service is either exhaustive or one-at-a-time 
(as in Ref. 4). These correspond to Pmax = © and Dmax = 1, respectively. 
When the system is not heavily loaded, each station will have a small 
load SO Pmax = © should not behave much differently from Dmax = 1. 
This behavior is exhibited by our approximate solution and by simu- 
lations. 

We will analyze our models by embedding them in a server-vacation 
model. In an M/G/1/FIFO queue, assume that at the end of each busy 
period the server takes a vacation. The vacations are iid random 
variables generically denoted by 7. The expected delay of a customer, 
in the steady state, is given by 
E(T?) 


E(D*) = K(D) + DET)’ 





(1) 
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Here E(D) is the Pollaczek-Khintchine formula for the expected delay 
in an M/G/1 queue: E(D) = ab./2(1 — ab,), where a is the arrival rate 
and b; and bz are the first and second moments of the service times. 
A derivation of eq. (1) can be found in Levy and Yechiali.® Our first 
step is to describe the vacations. 


3.1 Assumptions and notation 


The notation used in this paper and some self-explanatory symbols 
are given below. 

N = number of potential transmitting stations. 

\; = packet arrival rate at station 1. 
= constant service time/packet. 
= );A = load due to station i. 
> p; = load on the line. 
y¥ ); = total arrival rate. 
= one-way propagation delay. 

y = 27 + A = average overhead/cycle. 

Lines A and B shown in Fig. 1 are the same except for direction, so 
it is sufficient to model only line A. Notice that station N + 1 does 
not send messages on line A and that station 1 does not send messages 
on line B. We assume that packets to be transmitted appear at station 
i according to a homogeneous Poisson process with rate );, 1 = 1, 2, 
---, N. The arrivals at station i and j are independent when i ¥ j. 
We assume that all the packets contain the same number of bits. The 
amount of time that a line spends taking a packet from a station is 
the time required to read the bits of the packet. Therefore, the service 
time of each packet is a constant, A, say, where A is the number of 
bits/packet divided by the line speed in b/s. The time for a bit to 
travel between adjacent read taps is called the walk time; the average 
walk time between stations is denoted by w. 

Let 7 be the time to send a bit from station 1 to station N + 1 (and 
from N + 1 to 1). Then 27 is expended in each cycle to send a message 
indicating that the busy bit must be reset. An average of one-half of a 
packet processing time is lost in synchronization at each end station, 
so y = 27 + A is the average overhead per cycle. Notice that 7 is the 
time to walk from station 1 to station N + 1. 


A 
pi 
R 
A 
T 


3.2 Analysis of the exhaustive service model 


In this subsection, we assume that the packet arrival rates are the 
same at each station. We let \ denote the common arrival rate and p 
denote the common load on a station. The load on the line is R = Np. 
We also assume that stations 1, 2, ---, N + 1 are evenly spaced along 
the line. 

Let T represent the amount of time between a departure from 
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station 1 and the beginning of the next visit to station 1. Then T is 
the length of a vacation from station 1. By symmetry, T is the length 
of a vacation between successive visits to any station. Thus, when a 
station gains access to the line, an average of AE(7') packets are 
present. With exhaustive service, the expected time to clear a station 
is AE(T)A/(1 — p). Since a vacation from station 1 consists of reading 
packets at stations 2 through N and overhead, 


B(T) = 7 + ww - 1) AOS 
so 
_ yl =p) 
(rT) =T— 2 R«1. (2) 


Equation (2) is a special case of eq. (54) in Ref. 2. 
Let X denote the number of packets at station 1 at the end of a 
vacation. Then 


E(X) = AE(T), (3a) 
and 
Var(X) = \’Var(T). (3b) 
Theorem 4.7 in Ref. 3 asserts that 
Ap'[l — (N + 1)p + (2N — 1)p7] 





Var(X) = (1 — Ry (4) 
From egs. (2), (3), and (4) we obtain 
E(T*) _ p(N — 1)A + (1 — p)” 65) 
K(T) (1 — p)(1 — R) 
Equation (5) is exact when the walk times between any pair of stations 
are concentrated on A, 2A, ---. In Fasnet, we expect that the walk 


times are much less than A because the packets travel between stations 
at the speed of light and the read times are controlled by the speed of 
the line. Thus, we should regard eq. (5) as an approximation. 
From eqs. (1) and (5), 
pA p(N — 1)A y(1 — p) 
Bit) Se ee ee 
O° = 2a =p) * 20 - a — RB) * 20—R) 
RA y(1 — p) 
= 4 <1. 
211-—R) 2(1 — R)’ a (6) 


By symmetry, eq. (6) gives the expected delay of a packet at any 
station. The expected number of packets in a buffer, E(Q), say, is 
obtained from eq. (6) and Little’s theorem: 
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E(Q) = \E(D*) = —2 4 WO =) 


21-R) 2(1-—R)° 7) 


Consequently, the expected number of customers in buffers 1 through 
N is 
R? vyA(1 — p) 


NE(Q) = X1-R)* 


(1-—R) 2(1-—R)° (8) 


Equation (7) is compared to simulation experiments in Section IV. 
An intuitive understanding of eq. (6) may be gained by considering 
what happens for light loads. When p is small, the vacations are almost 
of constant length because (mostly) no customers are served during a 
vacation. Then E(T’) is about [E(T)]’. Now merge all the customers 
to obtain an M/D/1 queue with mean delay RA/[2(1 — R)]. Then eqs. 
(1) and (2) yield 
» fA , vl-») 
BO") = 94 — Rm) * 20 — Ry’ 
which is eq. (6). It is surprising that this heuristic light traffic argument 
produces the exact (modulo our other approximations) result for any 
R<1. 


3.3 Analysis of the one-at-a-time service model 


In this subsection we will obtain an approximate solution to a model 
where Dmax = 1. The approximation is based on an idea used in 
Lehoczky, Sha, and Jensen® for a similar model. We do not assume 
that the arrival rates are the same at each station (as we did in Section 
3.2). 

A central notion in the approximation is the completion time of a 
station. The completion time of a station is the duration of the interval 
that starts when that station begins processing a packet, and ends at 
the first epoch that another packet may begin (does begin, provided it 
is present) its processing at the station. The purpose of the approxi- 
mate solution is to estimate the mean and variance of the completion 
time, use these moments in the Pollaczek-Khintchine formula, and 
apply eq. (1) to an appropriate server-vacation model. 

Let V; denote a generic completion time at station i, in the steady 
state. Then 
V; = cycle overhead + A(1 + number of other stations sending a packet 

in this cycle). 

Let p; be the asymptotic proportion of time that packets are present 
at station 1. For T very large, the number of packets served at station 
lt by time T is 
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pit 
—— + 0(T 
where o0(7') is some function such that o(T')/T — 0 as T — ~, The 
arrival rate at station 1 is \;. Equating the arrival and departure rates 
yields 
pi = NiE(V)). (9) 
Now we make our first approximation. 
Approximation 1: In each cycle, the probability that station 1 trans- 
mits a packet is b; & \;E(V;). 
The effect of approximation 1 is to use the long-run proportion p; 


as a probability for each cycle. Thus, the expected number of other 
stations sending a packet during a completion time of station i is 


djxi 0;, 80 
E(V;) =A ( + x ] + ¥. (10) 
xi 
Hence, 
b; A XE(V;) = p; ( Sa 2. s) a NY: t1=1,2,---,N. (11) 
The solution of eq. (11) is 


be Daye ae 
= a= ; 
1 + pi Le 11+); 





: 1=1,2,---,N, (12) 
which can be verified by substitution into eq. (11). When p; = p and 
vy = 0, eq. (12) becomes 


fe p _ R/N 
‘ 1-(N-I1)p panels 
N 
which is the approximation given in Lehoczky et al.® 
Since 6; must be no larger than one (because it is a probability), eq. 


(12) constrains the feasible values of {p;}. When all the arrival rates 
have the common value \, a = Np/(1 + p) = R/(1 + p), so 


Ry/A 
1-R 

Equation (13) shows that for a given total load, R, stability is 
achieved only when the load is shared by a sufficiently large number 


of stations. For Fasnet, y/A = 3 is a typical value. Then eq. (13) 
asserts that for R = 0.9, N > 27 is required for stability; for R = 0.8, 


for all 1, 





b<1leN> 





(13) 
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N > 12 is required for stability; and for R = 0.5, N = 8 is required for 
stability. 

Here is why N cannot be too small. When N is small, the overhead 
per cycle (y) will be spread over a few customers, which has the effect 
of decreasing the capacity of the line. To see this more precisely, let c 
be the expected number of packets processed in a cycle. Then use eq. 
(11) to obtain 


Sil aA. opt. 2 ey/Alo 
C= 2h l-a ara l-a — a 
When all the arrival rates have the common value X, eq. (14) yields 
_ R(1 + y/A) 
~ 1+ p—-R’ (15) 


where p = AA. From eq. (15) we compute the average amount of 
overhead expended per packet per cycle* y/c: 


y_1-R+R/N y — 7 


ce RG+y7/d) * 1+7/A\ R "N a 


R ” NJ 
Eq. (16) shows that the average overhead per packet is a decreasing 
function of N, so if N were small, the overhead per packet might cause 
the line to be overloaded. 

It is interesting to note that this consideration does not arise in 
Section 3.2. When Pmax < ©, from time to time a station will stop 
transmitting packets because its quota for the cycle has been filled. 
This is the effect that is shown in eq. (16). We conjecture that when 
all other parameters are fixed, y/c is a decreasing function of Dmax.- 

Now we turn to an approximation for the mean delay. This will be 
done by proposing a suitable server-vacation model and applying eq. 
(1). The service-time moments in the Pollaczek-Khintchine formula 
correspond to completion-time moments here. When a packet arrives 
at station i, and no other packets are waiting to be transmitted at 
station j, that packet cannot be transmitted until station i gains access 
to the line. The length of time that station 7 does not have access to 
the line is the length of time to process the other stations in a cycle 
plus the overhead time, which is the completion time less one service 
time. Thus, T; = V; — A, so 


K(T;) = E(V;) — A, (17) 


* The fact that y/c is the right quantity to compute may not be obvious. A rigorous 
proof could use ergodic theory for regenerative processes (see, e.g., Section 6-4 of 
Heyman and Sobel,’ particularly Theorem 6-8). 


¢ 
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and 
Var(T;) = Var(V;). (18) 


From eqs. (10), (12), and (17) we can obtain E(T;). From eqs. (1) and 
(18) we see that only Var(V;) remains to be obtained. To get it, we 
make our second approximation. 

Approximation 2: In each cycle, the event that station i transmits a 
packet is independent of the event that station j transmits a packet 
for every j # l. 

The effect of approximation 2 is that the variance of the number of 
packets served at station i is b;(1 — b;) and the variance of the number 
of packets served in a cycle is Y%, b;(1 — b;). This yields the approxi- 
mation 


Var(V;) = A? 5 bj(1 — bj). (19) 


JFL 


Using eqs. (10), (12), (17), (18), and (19) in eq. (1) produces our 
approximation for the expected delay at station 1,1 = 1,2, ---,N.The 
resulting formula does not appear to provide any insight and is omitted. 

As a partial check on the efficacy of our approximation for the mean 
delay, we consider the limiting case of no overhead and identical 
stations. In this situation, the total content of the output buffers at 
stations 1, 2, --- N fluctuates as the queue length in an M/D/1 queue 
with arrival rate \ and service time A. From the Pollaczek-Khintchine 
formula, the expected queue length in the steady state, E(Qo), say, is 

R? 
E(Qo) = 5 — hy’ R<1. 


Our approximations produce (after some algebra) 


R? pak + (1-—R)(R—-p)]+1—-R 


E(@o) = 54 -R +p) 1h ’ 


R<1. 


Now let N — ~ and p | 0 with R = Np held fixed. This represents a 
system with many lightly loaded stations. Then 


“ R 
E(Qo) > 2 — B) as p| 0. 


In this limiting case, E(Qo) overestimates E(Q)) by R/2 and, the 
relative error is (1 — R)/R. Thus, the absolute error increases with R 
and is less than one-half, and the relative error decreases as the 
absolute error increases. 
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IV. COMPARISONS WITH SIMULATIONS 


A computer simulation of Fasnet has been constructed. There are 
N = 50 sending stations equally spaced along the line. The propagation 
time (7) equals the packet processing time at each station (A). 
We have chosen to consider 1000-bit packets and a line speed of 100 
Mb/s, which is representative of the operating region. Then A = 10 
ps, and \ = (R/50) x 10° packets/s. 

The measure of performance is the average queue length at the 
stations. Specifically, the simulation estimates the steady-state distri- 
bution of the queue length at station 1 and then computes the mean, 
E(Q,), say. The average queue length is Yi° E(Q;)/50 4 E(Q). The 
corresponding quantity from our formulas is called E(Q). From the 
queueing formula E(Q) = AE(D) we use E(Q) to estimate the average 
delay of a packet, E(D). 

Table I shows the results. 

The analytic approximation adequately replicates the simulation 
results. Table I and Fig. 2 demonstrate that for R as large as 0.8, 
Dmax = 1 and Pmax = © produce nearly the same average queue size. 
This means that the efficiency (in terms of not incurring too much 
overhead) of Dmax = © and the protection against a few stations 
dominating the line of Dmax = 1 can be simultaneously obtained by 
setting 1 < Dmax < ©. Table I shows that Dax = 3 is almost as efficient 
AS Dmax = ©. 

The approximation for the mean delay of a packet is less than 1/2 
ms even when R = 0.9 and Pmax = 1. 


V. THE EFFECTS OF BURSTY TRAFFIC 


In this section we return to the exhaustive service model and replace 
the assumption that packets arrive according to a Poisson process 
with the assumption that packets arrive according to a compound 
Poisson process. Fuchs and Jackson give statistical analyses of arrival 
times for terminal-to-computer calls.® Two of their conclusions are as 
follows: 

1. The exponential distribution is a reasonably good approximation 
of the times between bursts. 


Table |—Comparison of simulations and analytic approximations 


Daas E(Q) Simulation §E(Q) Analysis E(D) 
i 0.216 0.209 130 ps 
R=08 13 0.184 ane pi 
00 0.184 0.150 94 ps 
1 0.684 0.782 430 us 
R=09 13 0.440 — 
oo 0.398 0.346 192 ps 
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AVERAGE PACKET DELAY IN MICROSECONDS 


a 
E(D)= 





0 0.2 0.4 0.6 0.8 1.0 
R=LINE LOAD IN ERLANGS 


Fig. 2—Expected delay vs R for 1-kb packets, 50 stations, and 100-Mb/s line speed. 


2. The size of a burst (measured in various ways) has a geometric 
distribution. 
Recent analyses by Morgan of host-to-host file traffic indicate that 
the assumption of Poisson arrivals may not be justified.® 
The purpose of this section is to find out how sensitive the average 
delay is to the assumption of Poisson arrivals. We will see that in the 
exhaustive service model, the average delay can be significantly greater 
with bursty arrivals than with Poisson arrivals with the same rate. 
Specifically, we assume that the bursts arrive according to a 
Poisson process with rate \, and the burst sizes B,, Bo, --- are iid 
with 


P{B, = i} = (1 — &)é*?, i= 1, 2, +++. 


This arrival process can be interpreted as one where messages arrive 
according to a Poisson process with rate \,, and the jth message 
consists of a random number of packets with a geometric distribution. 
One reason for choosing a geometric distribution is that it equates the 
average delay of a packet and the average delay of a message (see 
Halfin?’). (The delay of a message is the delay of its last packet.) 
Another reason is that it uses only one parameter, and so we can 
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specify the mean (Mt, say) and variance (Vt, say) of the number of 
arrivals in an interval of length ¢ and then solve for \, and é. Doing 
so yields 











2M? V-M 
wey yey 
Letting z = V/M = 1 yields 
2M z—-1 
Na ae and Ei, (20) 


Equation (20) relates the parameters we might obtain from meas- 
urements, M and z, to the parameters of the model, \, and é. 

We will now obtain the delay of an arbitrary packet in the steady 
state. The analysis is similar to the analysis in subsection 3.2; as 
before we assume that the stations have statistically identical arrival 
processes. 

The analog of the Pollaczek-Khintchine formula for compound 
Poisson arrivals is given in Burke.’ In our notation, the formula is 


Ap A2(1 + &) EA 


Oy oe gp) ee 


(21) 
where p = \,A/(1 — &) = MA. 

To obtain the mean and variance of the vacation times, let X 
denote the number of packets at station 1 at the end of a vacation. 
From theorem 4.7 in Ref. 3, for R< 1 


_yM(1~ p) 
and 
XV 5 
Var(X) = GQ — PR [1 — (N + 1)p + (2N — 1)p’]. (22b) 


Since X is the number of packets that arrive in an interval of length 
T, 


K(X) = ME(T), (23a) 
and 
Var(X) = E[Var(X|T)] + Var[E(X|T)] 
= E[VT] + Var[MT] 
= VE(T) + M’Var(T). (23b) 
From eqs. (22) and (23) we obtain 
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E(T?) _ Var(X) Vv 
E(T) ~ MEX) et BW) 


= (N — 1)VA? + y(1 — p)? 








@—p)G — 8) en 
Substituting eqs. (21) and (24) into eq. (1) yields 
E(D*) = MAL +é) EA, (N—-1)VA? 4+ ¥(1 - pyr 
2(1 — p) ee 2(1 — p)(1 — R) 
Using eq. (20) yields 
E(D*) = 5 Raz, yQ-—p) , @- VA _ __ pAziz- 1) (25) 


(l1-—R) 2(1-R) 2 2(1 — p)(1 + 2) 


To compare eqs. (6) and (25), let a subscript z denote batch arrivals 
with variance to mean ratio z. Then 


A(z — 1) a pAz(z — 1) 
21—-R) 2(1 — p)(1 +z) 


When N is large, so that p is much smaller than R, the first term 
dominates, especially in heavy traffic. 

Table II shows the values of E(D*). The case E(D?) represents 
Poisson arrivals. The data are the same as in Section IV: A = 7 = 10 
ps, and N = 50. 

Table II shows that bursty traffic can have much larger expected 
delays than Poisson traffic with the same arrival rate. A crude ap- 
proximation of the increase is E(D}) = E(Df)vz — 1 for 2 sz <= 10. 
Even when z = 10 and R = 0.9, the mean delay is less than 1 ms. 


E(D?) — E(D*) = 


VI. CONCLUSIONS 


We have three conclusions. The first is that the approximations 
presented in Section III are sufficiently accurate for data transport 
performance studies of Fasnet. The second is that Dmax = 3 appears to 
be a good choice if the offered traffic is reasonably smooth (Poisson) 
and approximately equal to all stations. The third is that Fasnet 
should be able to provide 1-ms average-delay performance for bursty 
traffic. 


Table !I—E(D2) in us for several values of z 


(D7) E(D?) E(D3) E(Dio 
R=08 94 119 193 318 
R=0.9 192 _ 242 392 642 
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Exponential Smoothing (ES) as a forecasting technique has been exten- 
sively used since its introduction in the 1960s. It is simple, hence easy to 
implement, and in many cases performs surprisingly well. However, many 
phenomena require a more sophisticated forecasting technique. In this paper 
we introduce a new forecasting technique, Adaptive Gradient Exponential 
Smoothing (AGES). This technique extends the classical ES as used on simple 
data or on data with linear trend. For data with both linear trend and seasonal 
effects this extension results in a new and more general form of ES, which is 
presented in this paper. The new forecasting technique is tested on simulated 
data and some real data of the types mentioned above, and its performance in 
all these tests is clearly superior to ES. It is shown by analysis and by the 
experimentations that for certain types of data it does in fact converge to the 
optimal (in the mean square error sense) forecasts. 


l. INTRODUCTION 


The need for quick and reliable forecasts of various time series is 
often encountered in economic and business situations. In the Bell 
System, forecasting is used to help plan trunk and facilities for the 
telephone network,’* as well as to project computer workload, to 
determine staffing levels for operators or service observers, and more. 

Many forecasting techniques exist and different time series may 
require different techniques. In general, there is a clear trade-off 
between simplicity (resulting in cheaper implementation) and per- 


* Bell Laboratories. 
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formance of the forecasting technique. One of the simplest forecasting | 
techniques, Exponential Smoothing (ES), has surprisingly good per- 
formance. This technique was presented originally by Winters* and 
Brown’ and is described briefly in Section II. In Ref. 6 the optimality 
properties of ES are studied and we expand on these studies and use 
the conclusions as the basis for a new technique we introduce here. 

In fact, these studies revealed a relationship between the ES and 
the Autoregressive-Integrated Moving Average (ARIMA) model-fit- 
ting-based forecasting suggested by Box and Jenkins.’ This is further 
discussed in Section III. 

The extensive use of ES clearly indicated that for time series with 
nonstationary discontinuities or changes in the generating parameters, 
ES performance is not satisfactory. This prompted a number of 
researchers to develop the Adaptive Exponential Smoothing (AES) 
idea. In these techniques the algorithm is supposedly evaluating its 
own performance and correcting its parameters to obtain improved 
performance. Recently, the existing AESs (see, for example, Refs. 8 
through 11) were reviewed critically by Ekern.’? One of the points 
raised in Ref. 12 was that none of the existing AESs is supported by 
analysis or general performance claims (e.g., optimality). In addition, 
it should be pointed out that only Roberts’ and Reed’s AES”! can be 
used on data with both linear trend and seasonal effects, while the 
other AESs are limited to simpler data and have no natural generali- 
zation. 

In this paper we present a new AES algorithm, which we call 
Adaptive Gradient Exponential Smoothing (AGES). This technique 
naturally generalizes to data with both linear trend and seasonal effect. 
In addition, analysis of AGES for simple data and extensive simula- 
tions, using simple as well as more general data, strongly suggests that 
this technique converges to optimal performance in the mean square 
error (MSE) sense. 

Section II presents ES as commonly used. A new, more general form 
is developed with a discussion of its optimal properties. The new 
technique, AGES, is derived and presented in Section III, while the 
results of experiments with this technique on both real and simulated 
data are presented in Section IV. 


Il. EXPONENTIAL SMOOTHING AND ITS OPTIMAL PROPERTIES 


First we consider ES as Winters‘ did for three types of data: simple* 
(S), with linear trend (LT), and with both linear trend and multi- 
plicative seasonal effects (LSM). Common to all the configurations is 


*Simple data are of the form a + n(t), where a is a fixed value and n(t) is noise with 
zero mean. 
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the following: a time series {x(t)} is measured every time interval T 
(e.g., hour, day, or week), and ¢ is an integer representing the time tT. 
Then, one is interested in forecasting the value x(t + 1)* based on the 
data available up to and including t, namely x(0), x(1), ---, x(t). 

If x(t + 1) denotes the forecast, carried out at time t for x(t + 1), 
from Ref. 4 we have (using our own notation for consistency with the 
discussions in the sequel), for S data: 





&(t + 1) = ax(t) + (1 — a)£(t) (1a) 
O<sasl; (1b) 
for LT data: 

R(t + 1) = a(t) + B(t) (2a) 
G(t) = ax(t) + (1 — a)[a(t — 1) + b(t — 1)] (2b) 
b(t) = Bla(t) — a(¢ — 1)] + (1 — B)b(t — 1) (2c) 
0<a,6 <1; (2d) 

and for LSM data: 
&(t + 1) = (a(t) + B(t))e(t - L + 1) (3a) 
a(t) =a as + (1 — a[a(t -— 1) + b(-1)] — (3b) 
H(t) = Bla(t) — a(t — 1)] + (1 — B)b(t - 1) (3c) 
ale) =v Fa +L - viele - (34) 
0 <a, B, y <1, (3e) 


where L is the known periodicity of the season. 

In all the equations above, the parameters a, 8, and y are called the 
“smoothing coefficients”. 

Our first step is to rewrite eq. (1) and, more importantly, eq. (2). 
This provides the basis for a new form of ES for LSM data, more 
general than (3). The new form, which is a natural extension of (1) 
and (2), suggests types of data for which the ES algorithm can result 
in optimal (in the MSE sense) performance. 

Equation (1) can be readily rewritten as 


R(t + 1) = 6,K(t) + (1 — 61)x(t), (4a) 


*Note that we restrict our discussions to one-interval-ahead forecasting with the 
understanding that it can be generalized to more time intervals ahead. 
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where clearly 
6,=1—a. (4b) 
With some algebra one can show that eq. (2) is equivalent to 
&(t + 1) = H(t) + boX(t — 1) 


+ (2 — 6:)x(t) — (1 + 2)x(t — 1), (5a) 
where 

6, =2-a(1 +8) (5b) 

bo =a—1. (5c) 


The basic difference between (2) and (5) is that (5) reflects the 
assumption that the noise-free part of the data x(t) is generated by 
the difference equation 


y(t) — 2y(t — 1) + y(t — 2) = 0, (6) 
while (2) reflects the assumption that the solution of (6) is 
y(t) =a +t Ot. (7) 


[Note that in (2) a(t) is the current estimate of ‘a + bt’ and 6(t) is the 
current estimate of ‘b.’] 

The ES as given in (3) for LSM data is clearly based on the 
assumption that the noise-free part of the data has the form 


y(t) = (a + bt)c(E), (8a) 
where 
c(t + L) = c(t). (8b) 
The difference equation satisfied by (8) is 
y(t) — 2y(t — L) + y(t — 2L) = 0, (9) 
and the corresponding ES 


M M 
(t+ 1) = ¥ Ga(t-F+) — ¥ Gx(t-F+) 
j=l j=l 


+ 2xn(t -L+1)—x(t-2L +41). (10) 


The parameters 6;, J = 1, ---, M and the constraints they have to 
satisfy are discussed later. Also, the claimed correspondence between 
(9) and (10) will become more apparent in later discussion. | 

At this point, however, we emphasize that while (7) is the general 
solution of (6), and thus (2) and (5) are equivalent, (8) is only one of 
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many possible solutions of (9). Hence (10) represents an ES form that 
is more general than (3). 

Similarly, data with linear trend and additive seasonal effects* 
(LSA) have the underlying difference equation 


y(t) —- yt-1)-yt-LD)+y¥(t-L-1)=0 (11) 


and the corresponding ES is 
M . M | 
Kt+1) = } Ox(t-jF+1) — Y 6x(t —j +1) 
j=l j=l 


+ x(t) + x(t - LD +1) — x(t — D). (12) 


To unify and simplify the discussions ahead we introduce the 
following notation. Let D be a unit delay operator, namely Dx(t) = 
x(t — 1), and let A(D) be a polynomial in D such that 


1 for S data 
_ 2—D for LT data 

A(D) = ID ta pe for LSM data (13) 

1+ D’1!— D* for LSA data. 
With these definitions (4), (5), (10), and (12) can be unified as 

M 

&(t + 1) = Y O,Di(K(t) — x(t)) + A(D) x(t), (14) 
j=l 


where M = 1 will result in (4) and M = 2 in (5). 

It should also be pointed out that the ES as given by eq(s). (1) [(2) 
or (3)] has an implicit assumption in it. The assumption is that one 
(two or three) coefficient(s) can, in fact, smoothen the data. In other 
words, M in (14) is equal to one (two or three). However, its general 
form, (14), allows for a larger number of coefficients to get better 
approximations. 

To observe the optimal properties of the ES forecasts we define the 
forecast error as 


e(t) = x(t) — x(t) (15) 


and use as our criteria for the forecast quality the mean square error 
(MSE), i.e., E{e?(t)}. With this in mind, it is clear that optimal 
performance is achieved if the e(t) becomes a white noise sequence 
(i.e., independent and identically distributed with zero mean). Namely, 
the ES technique, while assuming knowledge of the generating process 
for the noise-free component of the data, attempts to “whiten” the 


*This type of data was not addressed in Ref. 4 and, as far as we know, no form of 
ES applicable to it was proposed before the one here. 
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noise component. This attempt implies an underlying assumption that 
the data are generated through, or at least approximated by, the 
process 


[1 — DA(D)]x(t) = 1 = : up| (t), (16) 


where ¢(t) is a white noise with variance o’. 
Substituting (14) and (16) into (15) results in 


M mn M 
F => ip'| e(t) = =) ap'| é(t). (17) 
j=l j=l 


This equation satisfied by e(¢) is the basis of our claims for correspond- 
ence between eqs. (9) and (10), and (11) and (12). Equation (17) 
immediately suggests the conditions for optimal forecasting. First, to 
get bounded MSE one must require: 


Condition 1: All zeros of the polynomial [1 — Y¥%, 6;\’] are outside 
the unit circle. 


If, in addition, we also require: 
Condition 2: 6; = 0;, 7 = 1,2, ---, M, 


then, clearly, from eq. (17), e(t) will converge to «(t) and optimal 
forecasting (in the MSE sense) is achieved. 


Remark 1: As we discussed here, the sufficiency of Conditions 1 and 
2 is quite obvious; however, they are also necessary. This 
is argued in Appendix A. 


Remark 2: In Ref. 4 a and 6 for LT data are restricted to interval 
[0, 1], which corresponds to the set Sz in Fig. 1. The actual 
constraints follow from applying Condition 1 to the M = 2 
case. This results in the set S; in Fig. 1, which clearly 
contains S, and is considerably larger. Allowing for a larger 
constraint set for #, and 62 (or, correspondingly, a, 8) will 
result in more cases for which ES could result in optimal 
performance. 


It. ADAPTIVE GRADIENT EXPONENTIAL SMOOTHING 


In the previous section we argued that for data that can be approx- 
imated by (16), forecasting with ES of the form (14) can result in 
optimal performance in the MSE sense. To achieve this, Conditions 1 
and 2 must be satisfied. However, while Condition 1 can be satisfied 
by proper choice of 6;, Condition 2 is, in general, hard to satisfy since 
the values of 6; in eq. (16) are not known. Basically, the ARIMA 
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Fig. 1—The constraint sets: 
Si = {(,, 62): [2] <1, 02 + 6, <1, 6. -6<1} 
Se = {(6,, 62): 6; = 2 — a(1 + B), 2 =a —1,0<a, B <1}. 


model-fitting-based forecasting’ deals with exactly this type of prob- 
lem. The 0;’s of eq. (16) are estimated and these estimates are then 
used as the 6,’s in eq. (14) in an attempt to satisfy Condition 2. In the 
KS algorithm no such attempt is made. In practice, the forecasters 
using ES choose some fixed values for the 6;, which satisfy Condition 
1 [or even more restrictively, e.g., eq. (2d)]. These values are based on 
intuition, experience, and familiarity with the data they forecast. 
However, considerable differences between the underlying 6,’s and 
the chosen 6;’s can result in significant performance degradation. This 
is demonstrated in Fig. 2 for the case M = 2. The MSE for this case 
was computed in a closed form as a function of 6, and 62 for some 
fixed 6, and 62 and graphed in the figure. Together with phenomena 
like nonstationary discontinuity* and changes in the data-generating 
process (i.e., the 6; change values), this resulted in unsatisfactory 
performance of the ES. The realization of what may cause this poor 
performance brought about the idea of using adaptive schemes where 


*Step-like changes in the data. 
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Mi... 


~2.0 2 : 5 : . : ; 4 2.0 


62 0 





MSE — MEAN SQUARED ERROR 


Fig. 2—The mean squared error as a function of the data-generating parameters for 
M = 2. (The smoothing coefficients are fixed at 0, = —0.3, 62 = —0.3.) 


the 6; are not fixed but are adjusted in an attempt to improve perform- 
ance. 

Compared to the existing Adaptive Exponential Smoothing (AES) 
techniques (see, e.g., Refs. 8 through 11), the new technique we 
introduce here is analytically more sound and there are strong indi- 
cations that it converges to opptimal performance in the MSE sense 
for the data approximated by (16). 

This new technique is based on the gradient search for the minimum 
of the MSE. If the MSE would have been available as a function of 
the 6;, then one could compute the gradient 


dE{e*(t)} 


V= = 18 
af (18) 

where 6 = [A, 62, «++, Ou], and recursively update the 6; through 
6(t + 1) = A(t) — BY, (19) 


where p > 0 is the adaptation constant. This is the gradient search 
technique, sometimes referred to as the steepest descent technique. In 
general, however, the MSE is not available as a function of the §;; 
hence, neither is the gradient. Instead, we use an instantaneous 
estimate of this gradient. To get this estimate we replace E{e?(t)} by 
e*(t) and the gradient by 


_ de*(t) ae 


= 2e(t) —— (20) 


- 
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Let us denote 


_ de(t) 
00 





s(t) 


as the “sensitivity vector,” since it gives an indication of how sensitive 
the error e(t) is to the values of 6. 

While s(t) is not available we can use eq. (17) to develop a means 
for generating it. Let us take partial derivatives of both sides of this 
equation with respect to §. Since the right-hand side does not depend 
explicitly on @ we get: 


M 
( ~y ip) s(t) = Di e(t) j=1,2,---,M. (21) 
i=1 


At this point we are ready to introduce the Adaptive Gradient 
Exponential Smoothing (AGES) technique. Combining egs. (14), (19), 
(20), and (21) we get: 


the forecast: 


M 
£(t + 1) = A(D)x(t) — Y G(t)e(é -i + 1D) (22) 


[see definition of A(D) in eq. (13)], 


the sensitivity functions: 
M 
s(t +1) = ¥ O:(t)s(¢ —i+1) + eft —j +1) 
i=l 


j=1,2,---,M, (23) 
and the coefficient adjustments: 
A(t + 1) = 0(t) — Que(t)s(t) j=1,2,---,M. (24) 


Recall that the error e(t) = x(t) — x(t). 

Both our simulations and our experiments (as described in the next 
section) strongly indicate that AGES converges to optimal perform- 
ance through convergence of 6;(t) to 6;. Namely, the error e(t) is 
adaptively whitened. Despite these indications, since the resulting 
equations are quite complex, a global proof of convergence of the 
AGES technique is beyond the scope of this paper. However, we 
conclude this section by treating the special case M = 1 and show 
local convergence properties for it. 

Let M = 1; then eqs. (17), (23), and (24) become 


e(t + 1) = O,(t)e(t) + e(t + 1) — Oy€(t) 
si(t + 1) = 6,(t)s,(t) + e(t), 
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and 
O(t + 1) = 6,(t) — Qpe(t)s,(t). 


Assuming 6,(t) is independent of e(t) and s,(t) (similar assumptions 
are common in convergence proofs of adaptive filters) and observing 
that E{e(t)-e(t)} = 07, Efe(t)-e(t + 1)} = 0, E{s(t)-e(t)} = 0 we get 


Eje(t + 1)} = Ef{6%(t)}-Efe*(t)} 
+ o[1 + 6 — 20,E{6,(t)}] 
E{s\(t + 1)-e(t + 1)} = E{6}(t)}--Efsi(t)-e(t)} 
+ E{6,(t)}-E{e2(t)} — 0,20? 
E{(t + 1)} = Ef6,(t)} — 2uE{s,(t)e(t)}. (25) 


If we assume in addition that 6,(t) has a small variance, namely 
Ejoi(t)} =~ [E{6,(t)}]? (the simulation results tend to support this 
assumption), defining 


yilt) = Efe*(t)} — 0? 
volt) = Efs,(t)-e(¢)} 
ya(t) = Ef6(t)} — 6; (26) 
and substituting in (25) results in 
yilt + 1) = [yalt) + i)Pyi(t) + o7[ya(t)P 
yolt + 1) = [ya(t) + A:)yo(t) + [ya(t) + @ilyi(t) + o?ya(C) 
ya(t + 1) = ya(t) — 2pye(t). (27) 


Clearly, if we could prove that y(t), ya(t), and y3(t) converge to the 
origin globally (i.e., independent of the initial values), it would mean 
that [see eq. (26)] the MSE converges to the minimum o? and E{6,(t)} 
converges to 6,. However, despite strong indications from our simula- 
tions that these variables do converge globally, we can prove only local 
convergence. In addition, the proof provides an indication as to how 
to choose the parameter py. 

Let us linearize eq. (27) around the origin to get 


yilt + 1) = Ofy1(t) 
yo(t + 1) = Ofyo(t) + Oryi(t) + o7y3(C) 
ya(t + 1) = ya(t) — 2uye(t). (28) 
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The coefficients matrix is 


6 OO 0 
A= 6, 6? a 
0 —2u 1 


and to ensure convergence all eigenvalues of A must be within the unit 
circle. The eigenvalues of A are 


Ai = 6f 
Ags = 1/2{1 + eet [a - g?)? — 8u02]"/?}, 
and it can be verified that choosing 


1-6 


< 
ag 207 





(29) 


will guarantee the convergence of eq. (28). 

Condition (29) implies that if | 6,| is close to one, » must be chosen 
very small and the convergence will be slow. Again, our simulation 
experiments verified this observation. 


IV. SIMULATION RESULTS 


We divide our experiments with AGES into two parts. In the first 
part we applied both ES and AGES on data generated by the computer 


AGES — ADAPTIVE GRADIENT EXPONENTIAL SMOOTHING 
ES — EXPONENTIAL SMOOTHING 


MEAN SQUARED ERROR (x &2) 





0, 


Fig. 3—Comparison of forecasting performance between ES (8 = 0.8) and AGES. 
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and compared the results. In the second part we applied the AGES to 
real data that we took from Ref. 7. 

Equation (16) was used to generate data of type S, LT, and LSM by 
the computer. The results of applying both ES and AGES on these 
data are presented in Figs. 3, 4, and 5 and in Table I. Each point on 
the curves of Fig. 3 corresponds to a complete run on a sequence of 
data generated with the particular choice of the 6;. The resulting MSE 
for the ES and the AGKS forecasts are presented and the comparison 
clearly indicates the superiority of the AGES algorithm. In addition, 
we observe that the AGES results, in almost all the runs, in a MSE 
very close to the minimun, o”. 

In Fig. 5, we followed the variation of the 6;(t) with time in a number 
of runs. The results clearly show that the 6(t) converge to the 6; from 
a variety of initial values; this indicates global convergence properties. 
Similar results are observed in Table I for data with seasonal multi- 
plicative effects and linear trend. The 6;(¢) clearly converge to the 6;’s, 
and the MSE, when AGES is applied, is again very close to the optimal 


AGES — ADAPTIVE GRADIENT EXPONENTIAL SMOOTHING 
ES — EXPONENTIAL SMOOTHING 


MEAN SQUARED ERROR (xX @2) 





Fig. 4—Comparison of mean squared error in forecasting with ES (6, = 62 = —0.3) 
and AGES as a function of the data-generating parameters 6; and 62. (a) #2 = —0.9. 
(b) A, = —0.6. 
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Fig. 5—Convergence of: (a) 6,(t) to the optimal value 6, in the AGES method. (b) 6,(t) 
and 6,(t) from various initial conditions to 6, and 62, using AGES on data with linear 
trend. 


value, o”. From Ref. 7 we took data of the simple kind (no linear trend 
or seasonal effects): The IBM common stock closing prices, daily, 
from May 17, 1961 through November 2, 1962. On the data we applied 
both ES and AGES and the results are presented in Fig. 6. Each point 
on the curves corresponds to a run on the same data, each time with 
a different coefficient (for the ES) and different initial condition (for 
the AGES). The further the coefficient used in the ES is from 6, 
(which in this case is equal to —0.1, as indicated in Ref. 7), the better 
the performance is for AGES. 

Further experiments were conducted on monthly international air- 
line passengers data.’ These data, as Fig. 7 indicates, are with linear 
trend and multiplicative seasonal effects. We applied the AGES algo- 
rithm (with M = 3) and the results are presented in Fig. 8. In Ref. 7 
it is claimed that sometimes rather than work with the actual data it 
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Table I—Comparison of MSE in forecasting with ES (6, = —0.2, 0 = 
0.5, 03 = 0.4) and AGES 


Data-Generating Coefficients (and Adaptive Coefficients) MSE (Xo?) 
9; (8;)* 82 (82) 63 (83) AGES ES 
1.4 (1.39) —1.3 (—1.29) 0.8 (0.75) 1.1972 11.7155 
2.1 (1.97) ~—1.95 (—1.79) 0.8 (0.68) 1.3831 20.5821 
0.75 (0.69) —0.6 (—0.56) 0.8 (0.78) 1.0749 5.1431 
0.6 (0.6) —0.75 (—0.76) 0.8 (0.77) 1.0600 5.0256 
—0.75 (—0.73) 0.6 (0.6) 0.8 (0.79) 1.0663 1.3577 
0.0 (—0.04) 0.0 (0.01) 0.0 (0.03) 1.0040 1.5737 
1.0 (0.98) —1.0 (—0.99) 1.0 (0.94) 1.2001 8.6414 
—0.2 (-—0.16) 0.5 (0.49) 0.4 (0.4) 1.0397 1.0137 
—0.1 (—0.09) 0.25 (0.26) 0.4 (0.41) 0.9973 1.0839 
1.2 (1.19) —0.9 (—0.81) 0.4 (0.36) 1.1044 7.1069 
1.8 (1.74) —1.35 (—1.24) 0.4 (0.34) 1.2192 13.0215 
0.3 (0.31) —0.75 (—0.75) 0.4 (0.38) 1.0574 3.8630 
1.0 (0.96) —0.5 (—0.48) 0.0 (—0.04) 1.0605 4.2218 
1.5 (1.42) —0.75 (—0.66) 0.0 (—0.05) 1.1165 6.9455 
0.75 (0.77) 0.0 (0.03) 0.0 (0.03) 1.0212 2.4687 
0.0 (—0.03) —0.75 (—0.74) 0.0 (0.02) 1.0314 3.5907 
0.2 (0.19) 0.5 (0.53) —0.4 (—0.43) 1.0186 1.7692 
1.2 (1.19) —0.15 (—0.14) —0.4 (—0.39) 1.1115 4.0338 
-1.8 (1.7) —1.35 (—1.22) —0.4 (—0.36) 1.2077 13.8494 
—0.3 (0.3) —0.75 (—0.76) —0.4 (—0.39) 1.0438 4.6957 
—0.75 (—0.79) —0.3 (—0.28) —0.4 (—0.38) 1.0062 3.9623 
0.4 (0.43) 0.5 (0.49) —0.8 (—0.78) 1.0461 2.6126 
—0.7 (—0.73) —0.65 (—0.62) —0.8 (—0.8) 1.0670 6.1628 
—0.6 (-0.61) —0.75 (—0.75) —0.8 (—0.79) 1.0815 6.8677 
—0.75 (—0.74) —0.6 (—0.56) —0.8 (—0.74) 1.1065 7.0182 
—0.5 (—0.52) —0.4 (-—0.38) —0.8 (—0.81) 1.0759 5.1676 


* The values to which 6,(t) converge are given in parentheses. 


is more convenient to work with the logarithm of the data. As we 
argue in Appendix B, these logarithms, as data, have linear trend and 
additive seasonal effects (see Fig. 8). Hence, on the logarithms we 
applied AGES for linear trend and additive seasonal effects and the 
results are presented in Fig. 8a (M = 3). We used the same data (the 
logarithms) to see whether the performance improves with larger M. 
AGES was applied with M = 13 and the results, as presented in Fig. 
8b, clearly indicate that for this data M = 3 was sufficient. 


V. CONCLUSIONS 


In this paper we have introduced a new forecasting technique, 
Adaptive Gradient Exponential Smoothing (AGES), which is based 
on Exponential Smoothing (ES). We have elaborated on the optimality 
properties in the MSE sense of the ES. For certain types of data, the 
KS can result in optimal performance provided some coefficients are 
known. In general, these coefficients are unavailable, and the AGES 
shows strong indications of converging to these unknown coefficients 
and providing optimal performance. 
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Fig. 6—Comparison of performance of forecasting with the ES (varying the coefficient 
in each run) and the AGES methods. 


ACTUAL DATA-~~ 


NUMBER OF PASSENGERS IN THOUSANDS 





0 12 24 36 48 60 72 84 96 108 120 132 144 
NUMBER OF MONTHS 


Fig. 7—Forecasting with AGES international airline passengers (M = 3). (Note that 
these data have linear trend and multiplicative seasonal effects.) 


Clearly, more extensive experiments and practical use of the pro- 
posed forecasting technique, the AGES, are required. A user-friendly 
software package can be developed for implementation of this tech- 
nique if sufficient interest is generated. 
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Fig. 8—Forecasting with AGES the logarithm of the data in Fig. 7 for: (a) M = 3. 
(b) M = 13. (Note that the logarithm of the data in Fig. 7 has the form of data with 
additive seasonal effects and linear trend.) 
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APPENDIX A 


Necessity of Conditions 1 and 2 for the Convergence of e(t) to ¢(t) in 
Equation (17) 

Condition 1 is clearly necessary (as well as sufficient) for the 
convergence of Ef{e?(t)} to a finite value. We want to show that 
Condition 2 is necessary for E{e?(t)} to converge to o”. 


Let 


Yee(T) = Efe(t)-e(t — 7)} (30) 
and 
Yee(r) = Efe(t)e(t — 7)}. (31) 
Clearly, 
Yee(T) = Yee(—7) (32) 
and, from eq. (17) and the definition of e(t) 
Ye(—7) = 0. (33) 


With these definitions it follows from eq. (17), after transients die, 
that 


Ye(0) = a” 
Yel) — 81Yee(0) = —O107 
Yec(2) o 01Yee(1) a BoVee(0) = —O0° 


Ye(M) — Oyye(M — 1) — +++ — Ove = —Omo?, 


or in matrix form 


1 0 0 0 -1 
-, 1 0 ea eae 0, 
6, —- 1 0 i 02 

; ‘ ' = —g” (34) 
—Ou —OAny-1 P 6,26 1 Ye(M) Ou 
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Also, 


Yee(0) — ByYee(1) — Barvee(2) — +++ — OurYee(M) 
= Yel0) — O1%e(1) — --- — OmYee(M) 
Yee(l) — ByYee(0) — Bae) — +++ — OwYee(M — 1) 
= —OrYe(0) — ++» — Owye(M — 1) 
Yee(2) — OrYee(1) — BxVee(0) — +--+ — Ou-vee(M — 2) 
= —O2Vec(0) — +++ — OmYee(M — 2) 
Yee(M) — 1¥cee(M — 1) — BeYee(M — 2) — -++ — B6Yee(0) 
= —Omu7eA0) 


or again in a matrix form 


—b, 


1 0 0 0 
-h 1 0 0 
—6; 1 0 
—0u —Ou-1 1 
0 6, 0, co Ou-2 Ou-1 Om Yee(0) 
0 62 @3 +++ Om-+ Om 9 Yee(1) 
0 63 04, +--+ Oy 0 0 Yee(2) 
0 Om 0 0 oO oO Yee(M) 
00 0 0 0 0 Yee(M) 
1 —6; —Ae —Om Yee(0) 
—6; —O. —63 0 Yee(1) 
—6, —-03 —O4 0 Vee(2) 
= ; - . ; : (35) 
Oy «(COO 0 0 Yee(3) 
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Now, if we claim that e(t) converges to ¢(t), it means that 


for 7 =0 


2 
ae 2_ )¢9 
Tel) eo {9 for + #0 


and 
Yee(T) = 6(r)o”. 
Then, substituting this in (34) or (35) results in 


6; = 6; for J= 1,2, ---, M, 


which is Condition 2. Hence this condition is necessary as claimed. 


APPENDIX B 


Possible Transformation of Multiplicative Seasonal Effects Into Additive 
Seasonal Effects 


Suppose we are given data of the noise-free form 


y(t) = (a + bt )e(t) 
c(t + L) = c(t) } 28) 
which is with linear trend and multiplicative seasonal effects. 
Let 
2(t) = Log[y(¢)]. (37) 


Then substitution of (36) gives (if we assume bt « a, which is true in 
most real data): 


z(t) = log a + log ( + ; ) + log c(t) 


U 


log a 2 + log c(t) 


= a+ bt + dt), (38) 
where 
a= loga 
b=? 
a 


c(t) = log c(t). 


Hence z(t) clearly has the form of data with linear trend and additive 
seasonal effects. 
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COSMOS—the Computer System for Main Frame Operations—is an op- 
erational support system that inventories and assigns central office facilities 
to serve customer circuits. As part of this assignment responsibility, COSMOS 
must provide the central office personnel who will physically connect the 
circuit not only with information about the facilities to be connected, but also 
the order in which they will be connected (i.e., connection sequence or 
“connectivity”). Also, COSMOS must determine the circuit connectivity to 
permit automatic assignment of tie pairs—inter- and intra-frame cables that 
permit the connection of facilities that are widely separated physically. A new 
algorithm has been added to COSMOS to permit the determination of con- 
nectivity. This algorithm is based on the algorithm that determines the 
minimum-weight spanning tree of a connected graph. However, the algorithm 
is specialized for COSMOS by taking into account such factors as minimizing 
the maximum number of connections at any node and restricting certain nodes 
to a maximum number of connections. 


I. INTRODUCTION 


When a mechanized system assigns facilities to provide a telephone 
circuit (to fulfill a request for service, say), it must accomplish three 
things. It must 

1. Determine which facility types are required to provide the service 


* Bell Laboratories. 
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2. Select a particular unit which is available for each such facility 
type 

3. Determine the circuit topology of the assigned facilities. 

Steps 1 and 2 can be reduced to an algorithm using straightforward 
procedures. Step 3, however, has proved to be difficult and has been 
left for manual determination in all but the simplest cases. In this 
paper an algorithm is reported that has been successful in determining 
the circuit topology for most of the circuits encountered in telephony. 


il. A PARTICULAR APPLICATION—COSMOS 


COSMOS is the name of a minicomputer system (DEC PDP 11/70 
and PDP 11/45) designed for use by telephone operating companies 
to assign and administer central office equipment.’ A major impetus 
for its development and continued deployment is to increase the 
efficiency of central office personnel who must physically connect, 
rearrange, and disconnect facilities to provide service to customers. 
Accordingly, an important feature of COSMOS is its ability to produce 
a report (called the Frame Output Report or “FOR”) for the central 
office personnel that clearly specifies what should be connected to 
what. Most circuits for which COSMOS must create a FOR are simple, 
i.e., only two facilities must be interconnected on the frame. Some 
circuits, however, can be quite complicated in that some facilities in 
the circuit must be interconnected in series while others must be 
connected in parallel. An example of such a case would be a circuit 
with a main line and an off-premises extension where the bridge point 
is in the central office. This example becomes more complex if signal 
conditioning equipment must be placed in series with each line. 

Such an example is illustrated in Fig. 1. This circuit includes a main 
line (cable pair 4-980) and three off-premises extensions (cable pairs 
4-981, 4-982, and 4-983). Each cable pair must be connected to the 
line equipment through a bridge lifter (BL 49, 50, 51, and 52). Since 
the bridge lifters are located on a different frame from the line 
equipment and the cable pairs, tie pairs (TP 107, 304, 305, 306, and 
307) must be used to interconnect all the components of this circuit. 

Since the FOR must unambiguously state how the connections are 
to be made, either the person establishing the order for service in 
COSMOS must provide the connection sequence (“connectivity”), or 
COSMOS itself has to be capable of determining the connectivity. All 
initial versions of COSMOS had to be connected manually. Starting 
about 1977 logic was added so COSMOS could automatically determine 
connectivity in certain situations. The current generic of COSMOS 
(generic 9.0) is being developed to incorporate connectivity determi- 
nation in all cases but still allow the user to manually override the 
automatic connectivity logic if necessary. This paper presents the 
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LINE EQUIPMENT 
OE 000-007-401 


cP cP cP cP 
4-980 4-981 4-982 4-983 BL — BRIDGE LIFTER 
CP — CABLE PAIR 
NOTE: EACH LINE IS A SCHEMATIC TP — TIE PAIR 
REPRESENTATION OF A PAIR Ns — FRAME CONNECTION BY 
OF WIRES. FRAME ATTENDENT 


Fig. 1—Example of a main line with three off-premises extensions. 


algorithm developed to achieve this capability and illustrates how it 
benefits the COSMOS user. 


Ill. REVIEW OF COSMOS CAPABILITIES 


As we already mentioned, COSMOS accepts a service order as input 
and creates the FOR as output. The service order is input to COSMOS 
by aclerk in the Loop Assignment Center (LAC). Certain information 
must be entered by the clerk so the order can be processed by 
COSMOS, while other information is optional, depending on the 
particular order. The required information is the order number and 
the order due date. If a switching equipment connection is to be 
assigned to the customer, then the switching equipment features and 
the customer class of service must be specified also. Specific facilities 
to be assigned to the customer can either be specified when entered or 
automatically assigned by COSMOS. Actually, the automatic assign- 
ment takes place in two levels, depending on the facilities: 1) COSMOS 
determines the need for the facility and then selects a particular 
facility for the circuit, or 2) the LAC clerk specifies the need for a 
facility on input and COSMOS selects a particular facility for the 
circuit. Table I lists the facilities administered by COSMOS and how 
they are selected for a particular circuit by COSMOS—i.e., manual 
specification of the particular facility, manual specification of the need 
for the facility, or complete automatic selection by COSMOS. Table I 
also specifies that some facilities are terminated on a Main Distrib- 
uting Frame (MDF), while others are not. The facilities that are not 
terminated on an MDF either have no physical termination (for 
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Table I—COSMOS administered facilities 


MDF Assignment Mode 

Frame Ter- Need Full 

Facility mination Manual Specified Auto 
Telephone Number (TN) No Yes Yes No 
Extra Number (XN) No Yes No No 
Group (GP) No Yes No No 
Terminal (TER) No Yes No No 
Relay (RLY) No Yes Yes Yes 
Message Register (MR) No Yes Yes Yes 
Private Line Number (PL) No Yes No No 
Special Equipment (SE) No Yes No No 
Special Equipment (SE) Yes Yes No No 
Cable Pair (CP) Yes Yes No No 
Line Equipment (OE) Yes Yes Yes No 
Concentrator (CON) Yes Yes No No 
Tie Pair (TP) Yes Yes Yes Yes 
Bridge Lifter (BL) Yes Yes Yes Yes 
Trunk (TK) Yes Yes No No 


example, telephone numbers, groups, and terminals on an electronic 
switching system are software variables) while others are terminated 
on an intermediate distributing frame [such as relays and message 
registers on a No. 5 crossbar switching system (5XB)]. 

Appendix A describes an overview of the input language for the 
Service Order Establishment (SOE) transaction. An appreciation of 
the language is helpful in understanding the example presented in 
Appendix B, which shows the effect of the connectivity algorithm on 
the user input. The example in this appendix shows the service order 
input, as well as excerpts from the FOR to connect the circuit shown 
in Fig. 1. The detailed functioning of the connectivity algorithm for a 
particular example is described in Appendix C. 

So far, only orders resulting from customer requests for service have 
been described as input to COSMOS. Another major source of input 
to COSMOS are work orders; i.e., orders initiated by the telephone 
company personnel to change out defective equipment or to rearrange 
circuits to accommodate growth. These transactions also use the 
connectivity algorithm. 


IV. THE CONNECTIVITY ALGORITHM 


The connectivity algorithm, which was first proposed by H. L. York,” 
is based on the concept of a minimum-weight spanning tree of a 
connected graph. For each circuit whose connectivity is to be deter- 
mined, a graph whose nodes correspond to each of the elements of the 
circuit is constructed. The edges of this graph are assigned weights 
such that the smaller the weight the more likely the two circuit 
elements (nodes) are to be connected directly to each other. The final 
connections between the circuit elements is determined by finding a 
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spanning tree whose edges have a total weight less than or equal to all 
other spanning trees for this graph. Well-known methods are available 
for finding the minimum-weight spanning tree of a connected graph. 
One very straightforward algorithm is given by E. Horowitz and S. 
Sahni® (see especially Section 6.2). An apparently more efficient 
algorithm plus other extensions of the spanning-tree concept is given 
by R. C. Prim.‘ As this paper describes later, none of these algorithms 
could be applied directly to the COSMOS problem because of addi- 
tional side-constraints that had to be imposed on real circuits. These 
in turn led to a more efficient algorithm than can be obtained for the 
general case. The problem of determining circuit connectivity is now 
reduced to obtaining pair-wise connection weights for all facility 
combinations and to specifying the particular algorithm for calculating 
the minimum-weight spanning tree. These will each be discussed in 
turn. 


4.1 Determination of connective weights 


In Table I, two types of special equipment (SE) are noted: those 
with a frame location and those without. Even among SE terminated 
on the frame there are subgroupings that must be treated differently 
by the connectivity algorithm. These will now be described. 

The SE file in COSMOS contains “miscellaneous” equipment that 
is not explicitly recorded in any of the other COSMOS equipment 
files. The name of the SE is created during the order input and a 
record for the SE is allocated at that time. When an order to disconnect 
the circuit is established and completed, the record allocated to this 
SE is released to a list of free records. During input of the name of 
the SE, the frame location (if one exists) is input also. The SE receives 
special treatment by the connectivity algorithm, depending on the SE 
name and the presence or absence of a frame location. The various 
subcategories of SE are shown in Table II. 

With these subdivisions of the equipment that can be represented 
in the SE field, plus the other facilities that have frame terminations 
as listed in Table I, the user can construct a complete list of facility 


Table !1—Subcategories of special equipment 


Input Action 

If no frame location is input Ignore SE for connectivity determina- 
tion 

First two characters are RE Treat as a REG (repeater with gain) 

First two characters are DL Treat as a DLL (dial Long Lines) unit 

First two characters are VR Treat as a VR (voice repeater)—an ex- 
ample would be an E6 repeater 

First four characters are DPP- or char- Treat as a trunk 

acter string begins with “.” 
Anything else Treat as an SE 
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types that must be processed by the connectivity algorithm. The next 
step is to construct a matrix whose rows and columns represent each 
of these facilities and whose elements are the numerical weights 
associated with how likely the two facilities are to be connected directly 
to one another. This will be referred to as the generalized weight table. 
When the algorithm is presented an actual list of elements that must 
be connected together, the weights for the graph constructed for this 
circuit will be obtained from the generalized weight table. 

The generalized weight table is constructed as follows: 

1. All possible facility types are classified in four broad categories: 
switching equipment, conditioning equipment, metallic facilities, and 
tie pairs. In general, any circuit must be connected in the order: 
switching equipment-—conditioning equipment-—metallic facilities. Tie 
pairs are assigned as needed to facilitate these connections. 

2. Table III shows this classification of facilities. When assigning 
weights, the user should note that some conditioning equipment is 
likely to be connected directly to another conditioning equipment of 
the same type while other types would be unlikely to be connected 
directly to each other. For example, if several bridge lifters (BLs) were 
in the same circuit, they would likely all be connected together. 
Facilities of this type are noted as “bunching” on Table III. 

The range of weights is arbitrarily chosen to lie between zero and 
one hundred. With the considerations just described plus a review of 
many likely circuits, the generalized weight table shown in Table IV 
was developed. 

As mentioned earlier, for a particular circuit a graph is established 
and the weights for the edges are taken from the generalized weight 
table. After that step the weights are further modified if any of the 
following additional information applies to the circuit. 

1. If tie pairs are already present in the circuit (i.e., an existing 
circuit is being modified), then the two facilities connected by the tie 
pair are recorded in the tie pair record. The weight between these two 
facilities and the tie pair is reduced to a small value. 


Table I!|—Grouping of facilities 
I Switching equipment Line equipment 

II Conditioning equipment Bridge lifter (bunching) 
Special equipment (RE) 
Special equipment (DL) 
Special equipment (VR) 
Special equipment (bunching) 

III Metallic facilities Cable pair 

Trunk 
Concentrator 


Special equipment (DPP-) 
Special equipment (.) 
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Table |V—Generalized weight table 


SE: SE: SE: SE: SE: 
BL DL RE VR SE CP TK . DPP- CO TP 
5 
50 90 
55 90 90 


60 25 25 5 25 7 70 70 70 
60 70 70 73 #.%7 7 7 75 75 90 
909 90 90 90 90 90 90 90 90 90 90 





2. If a party circuit is being processed, then the facilities associated 
with each party are identified by a party number. Those facilities that 
do not belong to the same party have their weights increased to a 
maximum value. 

3. If a circuit with one or more off-premises extensions is being 
processed, then those facilities belonging to the same “leg” will have a 
Different Premise Address (DPA) value assigned to them. Conse- 
quently, the weights are increased to a maximum value for those edges 
connecting facilities in different “legs”. 

4, If the circuit contains tie pairs, then the weights between facilities 
terminated on different frames will be increased somewhat. This is 
done to avoid assigning tie pairs unnecessarily. 


4.2 Fundamental considerations 


There are two special conditions that apply to determining circuits 
for central office facilities that do not apply to circuit connectivity in 
general. Not only do these conditions enable COSMOS to determine 
the correct configuration, but their use speeds up the algorithm as well 
since some nodes can be eliminated from consideration after a certain 
point in the algorithm has been reached. These special conditions are: 

1. When a user is choosing among several Minimum- Weight Span- 
ning Trees (MWSTs) (they need not be unique for a given graph), the 
tree with the minimum number of branches at the node with the most 
branches is preferred. An example of this is shown in Fig. 2. The 
algorithm does not actually calculate all possible MWSTs and then 
choose the one with the minimum number of branches at the node 
with the maximum number of branches. Instead, several strategies are 
employed, depending on the circuit being processed. If the circuit 
contains office equipment (OE), then most nodes have a maximum 
number of connections to them which is calculated before the MWST 
processing begins (as described in item 2 of this listing). The only 
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cP1 ore cP 
70 (a) 70 70 
cP2 Zz cpa cp2 
cP4 
70 70 
cp3 CP3 
(b) CP — CABLE PAIR (c) 


Fig. 2—The minimum-weight spanning tree showing (a) the graph; (b) a maximum 
of three branches; (c) a maximum of two branches. In this case (c) is the preferred tree. 


exceptions are nodes that represent BLs. To prevent all BLs from 
connecting to one BL, the edge weights of all BL edges not yet selected 
for the MWST terminating on a BL just connected are incremented. 
For a circuit that contains no OE, the edge weights of all edges not 
yet selected for the MWST terminating on a node that has been 
selected for the MWST are incremented. These strategies direct the 
algorithm towards selecting the MWST with the required branch 
minimization. This requirement relates to how circuits are actually 
wired on the frame. If too many connections must be made at one 
terminal, the craftsperson may physically run out of room on the 
terminal and thus be unable to complete all the connections. Also, if 
an order is subsequently received to disconnect one of the legs of the 
circuit, proper “housekeeping” might require dismantling all connec- 
tions at a terminal and then reconnecting the remaining legs. This 
process is much simplified if the number of connections at a terminal 
is minimized. 

2. When an OE is present in the circuit, each facility is allowed a 
maximum number of “outward connections.” An outward connection 
is defined as a connection away from the OK. When an OK is present, 
it will be the root of the tree and therefore a direction away from the 
OE (root) is always defined. The maximum number of outward con- 
nections is determined by the following rules: 

(a) Metallic facilities (see Table III) have zero outward connections 
since they must always be at the outermost “tips” of the 
branches. 

(b) Conditioning equipment (see Table III) is allowed one outward 
connection. In determining outward connections, a connection 
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between two BLs is not counted. This is because a BL will 
usually be at a branch point and therefore will have additional 
outward connections. 

(c) If no BLs are present in the circuit, the number of outward 

connections from the OE equals the number of metallic facilities. 
If BLs are present, the number of outward connections from the 
OE equals the number of metallic facilities minus the number 
of BLs plus one. This formula reflects the actual way in which 
such circuits are wired: If BLs are present, they are the bridge 
point instead of the OE. In fact, BLs are often hard-wired in 
parallel in anticipation of their use as bridge points. 
This set of rules (a through c) constitutes the principal reason for a 
speed-up of this algorithm over the general case, since once the 
maximum number of outward connections is achieved, a particular 
node no longer needs to be considered. 

3. When an edge is chosen for the MWST, its weight is increased 
to the maximum value (100). This was chosen as the most efficient 
method to signal the algorithm not to consider this edge for the MWST 
again. 


4.3 Detailed description 


This section describes determining the list of facilities to be con- 
nected, the actual algorithm, and how the output list of facilities in 
connectivity order is assembled from the internal tables populated by 
the connectivity algorithm. This breakdown parallels the construction 
of the actual software. 


4.3.1 The list of facilities to be connected 


Connectivity processing is initiated when another COSMOS module 
determines that connectivity must be established. If this is the case, 
the connectivity module is invoked and a list of facilities is presented 
to it. Before this list can be passed along to the connectivity algorithm, 
certain facilities must be “weeded out”. 

There are two types of facilities that must be excluded from con- 
nectivity considerations. The first type includes facilities that have no 
mainframe terminations. These facilities are telephone numbers 
(TNs), No. 1XB coded terminals (XNs), No. 5XB relays, electronic 
switching system groups and terminals (GP and TER), and special 
equipment (SEs) for which no frame termination has been entered. 

The second type of facility that must be excluded is frame-termi- 
nated facilities that will not be in the circuit at the time that the order 
being processed will be worked. This situation can arise because 
COSMOS allows multiple orders to be established on the same circuit 
if they are logically consistent with one another. Thus order number 
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1 with due date X may be removing facilities from an established 
circuit while order number 2 (the one being processed, say) with due 
date Y is adding facilities to the same circuit. If due date X precedes 
due date Y, then in processing order 2 the facilities being removed by 
order 1 should not be considered. However, if due date X is later than 
due date Y, then all facilities must be considered in processing order 
2; 

With these two considerations, a list of facilities is prepared for 
processing by the connectivity algorithm. After connectivity is deter- 
mined for these facilities, those facilities that were excluded are added 
to the end of the list of facilities that were placed in connectivity order. 


4.3.2 The algorithm itself 


The first step performed by the algorithm is to identify the equip- 
ment types that have been presented to it. It then proceeds to calculate 
the connection weights for all the edges of the graph describing the 
circuit using the considerations outlined in Section 4.1. These weights 
are stored in a weight table. Next the actual MWST processing begins. 
This is facilitated by updating a “working” table. Each row of the table 
contains the following information: facility, count of connections to 
the facility, available outward connections, lowest connection cost, 
and the facility connected by the “lowest connection cost” edge. Also, 
as the algorithm proceeds, a third table, the connection list, is created. 
The connection list table maintains a list of the edges selected for the 
MWST. 

The “working” table is populated as follows: each input facility is 
placed into the table. Initially, the count of all connections to the 
facility is set to zero for each facility. The available outward connec- 
tions for each facility are determined based on the considerations 
described in item 2 of Section 4.2. The lowest cost connection and the 
corresponding facility are determined by scanning the weight table for 
each facility. In case of a tie the first edge encountered in the weight 
table is chosen for inclusion in the working table. 

Now the first facility to be placed in the circuit must be chosen. If 
there is an OE in the working table, it is chosen as the first facility; 
otherwise the first facility in the working table is chosen. The first 
facility and the facility it is connected to in the working table are 
placed in the connection list. 

In the following, the first facility is taken as a facility appearing in 
the connection list. While there are still facilities that have not been 
connected to the circuit, the following instructions are repeated: 

1. Choosing among the facilities already in the circuit (i.e., in the 
connection list), find the facility in the working table with the lowest 
cost connection. In case of a tie take the facility that appears first in 
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the working table. The facility connected by the lowest cost connection 
edge will be referred to as the “new facility”; the original facility will 
be called the “old facility”. 

2. If the number of connections to the new facility is not zero, this 
edge cannot be part of the MWST or else a cycle would be formed. 
Skip to instruction 8 below. 

3. Add this connection to the connection list. 

4. Increment the number of connections for the two facilities. 

5. If the circuit contains an OE, decrement the number of available 
outward connections for the old facility unless both facilities are BLs. 
If both facilities are BLs, add one to the cost of all edges in the weight 
table that emanate from the old BL. This will reduce the maximum 
number of connections made at one bridge point, as explained in 
Section 4.2. 

6. If the circuit contains an OF, and if either the old or the new 
facility (or both) have zero outward connections available, change the 
costs in the weight table for all edges emanating from such a node to 
a maximum value. 

7. If the circuit does not contain an OE, and the old facility has two 
or more connections, add one to the cost of all edges in the weight 
table that emanate from the old facility node. 

8. Change the cost of the edge in the weight table that connects the 
old and the new facility to a maximum value. 

9. Reestablish the working table based on the new weight table 
costs. 

We may now assume that all facilities have been placed in the 
connection list. (Note that if there are N facilities to be connected, 
there will be N-1 entries in the connection list so that the end of the 
algorithm is readily detected.) Now the connection list must be con- 
verted to a linear list. A tree will be described by a linear list that 
enumerates each branch, one after another. The beginning of a new 
branch is detected by the repetition of a facility that already appears 
higher up on the list (the branch point). 

The algorithm for creating the linear list makes use of the working 
table left over from the MWST algorithm and the connection list. The 
algorithm 

1. Searches the working table (in reverse order) until a facility is 
found with only one connection. This facility is one end of a branch. 
It places the facility in the linear list. 

2. Searches the connection list (in reverse order) for the facility just 
placed in the linear list. It places the facility connected to it in the 
linear list. 

3. Decrements the connection count for both the old and the new 
facilities. It removes their connection from the connection list. 
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4, If the connection count for the new facility is greater than zero, 
it repeats Steps 2 and 3. If the connection count for the new facility 
is zero, the end of the current branch has been reached. It will then 
go to the next step. 

5. For each facility already on the linear list, it determines the 
number of remaining connections in the working table. If all connec- 
tions are zero, the linear list is complete. Otherwise, it selects the first 
facility encountered with a nonzero connection count. 

6. Enters this facility in the linear list and proceeds to Step 2. 

Note that the lists in Steps 1 and 2 are searched in reverse order so 
that the frame instructions are in a more “pleasing” sequence: line 
equipment first, then the first “leg”, then the second “leg”, etc. 

The connectivity algorithm is now complete. The facilities that were 
excluded from consideration at the start of the algorithm can be added 
to the end of the list. 

The steps just described are applied to a particular example in 
Appendix C. 


V. ACKNOWLEDGMENTS 


As we mentioned earlier, the basic idea of applying the MWST 
algorithm to the problem of determining circuit connectivity in COS- 
MOS is due to H. L. York, who also programmed the original version. 
The additional modifications for circuits containing an OE were 
conceived of and designed by J. B. Sharpless. The algorithm was 
rewritten by E. W. Merrill, who worked under Sharpless’ direction.® 
The current “owner” of the program is D. P. Bates. 


REFERENCES 


1. B. Bittner, “Computer System For Main Frame Operations (COSMOS),” Proc. 
IEEE Int. Conf. Commun., J (1976), pp 13-20. 

2. H. L. York, unpublished work. 

3. E. Horowitz and S. Sahni, “Fundamentals of Data Structures,” Woodland Hills, 
CA: Computer Science Press, Inc., 1976. 

4. R. C. Prim, “Shortest Connection Networks and Some Generalization,” B.S.TWJ., 
36, No. 6 (November 1957), pp. 1389-1401. 

5. J. B. Sharpless and E. W. Merrill, unpublished work. 


APPENDIX A 
COSMOS Service Order Language 


When COSMOS is ready to accept a command, it will print a 
prompt (%). Immediately preceding the prompt character two alpha- 
numeric characters are printed. These two characters represent the 
wire center with whose facilities the user wishes to work. The wire 
center is identified by the user at log-in time. 

After the prompt letters have been printed the user can enter the 
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transaction name. All service orders are initiated in COSMOS through 
the transaction SOE (Service Order Establishment). Particular inputs 
to SOE are established on separate lines, as many as are needed to 
specify the order. 

The first character of the first input line must be an H (standing 
for header). The remainder of the line contains general data pertaining 
to the order. Typical data items that appear on this line are the order 
number (identified by the prefix ORD), the order type (OT), and the 
due date (DD). The prefix-data groupings are separated by a vergule 
(/). This applies to all line types, not just to the H line. 

If all the data do not fit on the first H line, they may be continued 
on subsequent H lines. Once all the header data have been entered, 
facilities to be connected on the order are entered on a line (or lines) 
whose first character is I (standing for “in”). Facilities to be discon- 
nected by the order are entered on a line (or lines) whose first character 
is O (standing for “out”). Typical data items that appear on I or O 
lines are Cable Pair (CP), Telephone Number (TN), Office Equipment 
(OE), Universal Service Order Code (US), features (FEA), Telephone 
Number Exchange code (NNX), and Resistance Zone (RZ). 

In the case of facilities that are automatically assigned by COSMOS, 
the facility prefix may be followed by a question mark (?). This is a 
signal to COSMOS to assign the facility automatically. For example, 
if COSMOS is to select a telephone number somewhere on the I line 
the construction 


should appear. However, some wire centers contain several different 
switching entities. To distinguish among them the user is instead 
required to specify the exchange code. In this case automatic telephone 
number selection is triggered by the input. 


[eee JNNX 851/302: 


When all I and/or O lines have been input the user types a “.” on a 
single line. At this point processing of the order commences. It should 
also be noted that as each line is entered, rudimentary checks are 
performed. When this processing is completed COSMOS prints an 
underscore (_) as a prompt to indicate that the next line can be 
processed. 


APPENDIX B 
An Example of Automatic Connectivity Determination 


In this case COSMOS will be asked to process order number 
NAS0789. This is a new connect order (OT NC) and has a due date 
of August 1, 1981. The exchange code is 111 and COSMOS is to assign 
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the line equipment with a Universal Service Order Code (US) of 1FR 
and features (FEA) consisting of Touch-Tone* service (T), nonsleeve 
lead (N), nonessential (N), and loop start (L). Four cable pairs are to 
be assigned to the order—pairs 980, 981, 982, and 983 in cable 4. The 
resistance zones of these pairs are 22, 11, 12, and 13, respectively. A 
parameter is maintained in the database to indicate whether bridge 
lifters are needed. If any one of these resistance zones exceeds this 
parameter, then all pairs will be assigned bridge lifters. In this case 
the parameter is set to 18, a value exceeded by the resistance zone of 
the first pair. 

The input and the SOE response is as follows: 

90% SOE . 

H ORD NASO789/OT NC/DD 8-1-81 

_I NNX 111/OE 2/US 1FR/FEA TNNL 

—| CP 4-980/RZ 22 

_1 CP 4-981/RZ 11 

_| CP 4-982/RZ 12 

_| CP 4-982/RZ 13 


50000122 
ORD NASO789 
IN: CP 4-0980 


IN: CP 4-0981 
IN: CP 4-0982 
IN: CP 4-0983 


IN: OE 000-007-401 
IN: TN 111-1096 


IN: BL 49 
IN: BL 51 
IN: BL 50 
IN: BL 52 


IN: TP CM11-0107 

IN: TP CM11-0304 

IN: TP CM11-0305 

IN: TP CMt11-0306 

IN: TP CMt11-0307 

*TRANSACTION COMPLETED 

90% 

The string “SO000122” immediately following the period is the 
record number in the service order file selected by COSMOS to hold 
information about the order. This record number is useful in the event 


* Registered service mark of AT&T. 
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the order is not established properly and manual corrective action is 
required. 

The rest of the SOEs output are COSMOS assignments. First the 
four cable pairs are echoed back. These are followed by eleven auto- 
matically assigned facilities: an office equipment, a telephone number, 
four bridge lifters, and five tie pairs. (The tie pairs are needed to 
interconnect the bridge lifters and the office equipment and cable 
pairs since the bridge lifters are terminated on a different frame.) The 
facilities are listed by SOE in the order in which they are assigned. 
This is not the connectivity order. 

To show the connectivity order the frame output report must be 
executed. This is the report used by telephone company personnel to 
actually wire the circuit in the central office. The report itself is in a 
lengthy format for ease of reading. Instead of reproducing the entire 
report here, only excerpts that show connectivity are listed below: 


LINE EQP IN 000-007-401 
TIE PAIR IN CM11-0107 
MISC EQP IN BL 49 
TIE PAIR IN CM11-0304 
CABLE PR IN 4-0980 
*MISC EQP IN BL 49 
MISC EQP IN BL 51 
TIE PAIR IN CM11-0305 
CABLE PR IN 4-0981 
*MISC EQP IN BL 51 
MISC EQP IN BL 50 
TIE PAIR IN CM11-0306 
CABLE PR IN 4-0982 
*MISC EQP IN __ BL 50 
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MISC EQP IN BL 52 
TIE PAIRIN _—CM111-0307 
CABLE PRIN _ 4.0983 


Note the first leg of the circuit—extending from OE 000-007-401 to 
CP 4-0980. The beginning of the next leg is indicated by the asterisk 
(*) and the repetition of the facility BL 49. This is the first bridge 
point. This leg extends down to CP 4-0981. Now BL 51 is shown as 
the next bridge point. Notice that BL 49 is not the bridge point for all 
legs. This is the effect of the algorithm described in Section 4.2 to 
minimize the maximum number of legs emanating from a single bridge 
point. The remaining two legs extend from BL 51 to CP 4-0982 and 
BL 50 to CP 4-0983. 


APPENDIX C 
An Example of the Algorithm’s Execution 


The algorithm described in Section 4.3.2 will be followed in detail 
for a particular set of facilities: two bridge lifters (BL1 and BL2), two 
cable pairs (CP1 and CP2), and one line equipment (OE). The first 
step is to determine the connection weights for all the edges of the 
graph. These weights are determined from Table IV. Note that the 
diagonal terms are given a weight of 100, since a facility cannot be 
connected to itself. 


Step 1—Weight table 
BL1 BL2 CP1 CP2 OE 





In this particular case CP1 and BL1 have been assigned a DPA 
value of “ ” (i.e., a blank) and CP2 and BL2 have been assigned a 
DPA value of “999” by a previously invoked load module of SOE. Thus 
those edges connecting facilities in different “legs” (i.e., BL1-CP2 and 
BL2-CP1) have their weights changed to a maximum value. (In the 
next and in all following tables entries that have changed from the 
previous table are enclosed in parentheses.) 
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Step 2— Weight table 


BL1 BL2 CP1 CP2 OE 
BL1 100 5 60 (100) 35 
BL2 5 100 (100) 60 35 
CP1 60 (100) 100 70 65 
CP2 (100) 60 70 100 65 
OE 35 35 65 65 100 





Now the working table is constructed. Each facility has an entry 
and the connection count is initially set to zero. The Available Outward 
Connections (AOC) equal zero for the two metallic facilities (CP1 and 
CP2), equals one for the two conditioning facilities (BL1 and BL2), 
and equals one for the OE based on the formula: 


AOC = # metallic facilities — # BL’s + 1 
=2-2+1=1 


The lowest connection cost and corresponding facility are obtained 
from the weight table. 


Step 3—Working table 


Available Lowest 
Connection Outward Connection Corresponding 
Facility Count Connections Cost Facility 
BLI1 0 1 5 BL2 
BL2 0 1 5 BLI1 
CP1 0 0 60 BL1 
CP2 0 0 60 BL2 
OE 0 1 35 BL1 


Since the circuit contains an OH, this facility is chosen first and 
placed on the connection list. 


Step 4—Connection list 
OE-BL1 


The number of connections to the OE and BL1 are incremented 
(Step 6, working table). The number of AOC to the old facility (the 
OE) is decremented (Step 6, working table). The old facility now has 
zero AOC so the weight of all edges emanating from it is changed to a 
maximum value (Step 5, weight table). Finally, the working table is 
modified due to changes in the weight table. 
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Step 5— Weight table 
BL1 BL2 CP1 CP2 OE 








BL1 
BL2 








100 5 60 (100) 
5 100 100 60 (100) 





CP1 60 100 100 70 (100) 
100 60 70 100 (100) 
(100) (100) (100) (100) 100 


Step 6— Working table 


Available Lowest 
Connection Outward Connection Corresponding 
Facility Count Connections Cost Facility 
BLI1 (1) 1 5 BL2 
BL2 0 1 5 BL1 
CP1 0 0 60 BL1 
CP2 0 0 60 BL2 
OE (1) (0) (100) BLI1 


If we choose among the facilities already in the connection list (OF 
and BL1), the one with the lowest connection cost in the weight table 
is the first entry. The edge BL1-BL2 is added to the connection list. 


Step 7—Connection list 


OE-BL1 
BL1-BL2 


Since the connection count to BL2 is zero, this is an acceptable 
choice. The number of connections to BL1 and BL2 are incremented 
(Step 9, working table). However, the number of AOC to the old 
facility (BL1) is not decremented, since both facilities are BLs. In- 
stead, one is added to the cost of all edges that emanate from BL1 
(Step 8, weight table). Since neither the old nor the new facility has 
zero AOC, the edges emanating from these nodes do not have their 
weights set to 100. However, the BL1-BL2 weights are set to the 
maximum value (Step 8, weight table). Finally, the working table is 
modified according to changes in the weight table. 


Step 8—Weight table 


BL1 BL2 CP1 CP2 OE 











BL1 (101) (100) (61) (101) (101) 

BL2 (100) 100 100 60 100 

CP1 (61) 100 100 70 100 
(101) 60 70 100 100 


(101) 100 100 100 100 
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Step 9—Working table 


Available Lowest 
Connection Outward Connection Corresponding 


Facility Count Connections Cost Facility 
BL1 (2) 1 (61) (CP1) 
BL2 (1) 1 (60) (CP2) 
CP1 0 0 (61) BL1 
CP2 0 0 60 BL2 
OE 1 0 100 (BL2) 


If we choose among the facilities already on the connection list (OE, 
BLI1, and BL2), the one with the lowest connection cost in the weight 
table is the BL2 entry. Therefore, BL2-CP2 is added to the connection 
list. 

Step 10—Connection list 


OE-BL1 
BL1-BL2 
BL2-CP2 


Since the connection count to CP2 is zero, this is an acceptable 
choice. The number of connections to BL2 and CP2 are incremented 
(Step 12, working table). The number of AOC to the old facility (BL2) 
is decremented (Step 12, working table). Since both BL2 and CP2 now 
have zero AOC, the weights for all edges emanating from BL2 and 
CP2 are set to the maximum value (Step 11, weight table). Finally, 
the working table is modified according to changes in the weight table. 


Step 11— Weight table 


BL1 BL2 CP1 CP2 OE 











BLI1 100 61 101 

BL2 100 100 100 (100) 100 
CP1 61 100 100 (100) 100 
CP2 101 (100) (100) 100 100 











100 100 100 


Step 12—Working table 


Available Lowest 
Connection Outward Connection Corresponding 


Facility Count Connections Cost Facility 
BL1 2 1 61 CPi 
BL2 (2) (0) (100) (BL1) 
CP1 0 0 61 BL1 
CP2 (1) 0 (100) BL2 
OL 1 0 100 BL2 
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If we choose among the facilities already on the connection list (OE, 
BL1, BL2, CP2), the one with the lowest connection cost in the weight 
table is the BL1 entry. Therefore, BL1-CP1 is added to the connection 
list. 

Step 13—Connection list 
OE-BL1 
BL1-BL2 
BL2-CP2 
BL1-CP1 


Since the connection count to CP1 is zero, this is an acceptable 
choice. The number of connections to BL1 and CP1 are incremented 
(Step 15, working table). The number of AOC to the old facility (BL1) 
is decremented (Step 15, working table). Since both BL1 and CP1 now 
have zero AOC, the weight of all edges emanating from BL1 and CP1 
are set to the maximum value (Step 14, weight table). Finally, the 
working table is modified according to changes in the weight table. 


Step 14—Weight table 


BL1 BL2 CP1 CP2 OE 





Step 15-Working table 


Available Lowest 
Connection Outward Connection Corresponding 


Facility Count Connections Cost Facility 
BL1 (3) (0) (100) (BL2) 
BL2 2 0 100 BL1 
CP1 (1) 0 (100) BL1 
CP2 1 0 100 BL2 
OE 1 0 100 BL2 


The algorithm is completed when the connection list contains N-1 
entries, where N equals the number of facilities. In this case N-1 = 4 
and so all connections have been obtained. The remainder of the 
algorithm transforms the connection list to a linear list. 

Initially, the connection count for each facility, the connection list, 
and the linear list are as shown in Step 16. Search the connection 
count (from the bottom) to find a facility with a connection count of 
one. In this case the facility found is the OE. Next, search the 
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connection list (from the bottom) to find a corresponding facility. In 
this case the facility is BL1. Place these two facilities on the linear 
list and decrement the connection count for each. 


Step 16—Linear list 


Connection 
Facility Count Connection List Linear List 
BL1 3 OE-BL1 
BL2 2 BL1-BL2 
CP1 1 BL2-CP2 
CP2 1 BL1-CP1 
OE 1 


Applying this algorithm results in the table shown in Step 17. Since 
the connection count for BL1 is greater than zero, search the connec- 
tion list (from the bottom) to find another entry for BL1. The 
connection BL1-CP1 is found, so CP1 is added to the linear list, and 
the connection count for both BL1 and CP1 are decremented. 


Step 17—Linear list 


Connection 
Facility Count Connection List Linear List 
BL1 20 wanna OE 
BL2 2 BL1-BL2 BL1 
CP1 1 BL2-CP2 
Ch2 1 BL1-CP1 
OE 0 


The table now changes to what is shown in Step 18. Since the 
connection count for CP1 is zero, the connection count list is again 
searched (from the bottom) but only for facilities on the linear list 
(i.e., OE, BL1, CP1) for an entry with a nonzero connection count. 
The entry found is BL1. Searching the connection list for a corre- 
sponding facility results in the addition of the BL1-BL2 connection 
to the linear list. The connection count of each of these facilities is 
therefore decremented. 


Step 18—Linear list 


Connection 
Facility Count Connection List Linear List 
BL1 jf jj} =~ “geeees OF 
BL2 2 BL1-BL2 BL1 
CP1 0 BL2-CP2 CP1 
CP2 1 easels 
OE 0 
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The table now changes to what is shown in Step 19. Since the 
connection count for BL2 is greater than zero, search the connection 
list to find another entry for BL2. The connection BL2-CP2 is found 
so CP2 is added to the linear list and the connection count for both 
BL2 and CP2 are decremented. 


Step 19—Linear list 


Connection 
Facility Count Connection List Linear List 
BL1 0 ween OE 
BL2 Ls ween ne BL1 
CP1 0 BL2-CP2 CP1 
CP2 ee BL1 
OE 0 BL2 


The table now changes as shown in Step 20. Since all connection 
counts are zero, the algorithm terminates. 


Step 20—Linear list 


Connection 
Facility Count Connection List Linear List 

BL1 0 wennne OE 

BL2 0  weenee BL1 

CP1 QO weeene CP1 

CP2 0 weneee BL1 

OE 0 wee nne BL2 
CP2 
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Vector quantization has been used in coding applications for several years. 
Recently, quantization of linear predictive coding (LPC) vectors has been used 
for speech coding and recognition. In these latter applications, the only method 
that has been used for deriving the vector quantizer code book from a set of 
training vectors is the one described by Linde, Buzo, and Gray. In this paper, 
we compare this algorithm to several alternative algorithms and also study 
the properties of the resulting code books. Our conclusion is that the various 
algorithms that we tried gave essentially identical code books. 


I. INTRODUCTION 


The technique of vector quantization for LPC voice coding has been 
in use for several years, and has been shown to be of great utility for 
LPC analysis/synthesis systems.’* Recently, vector quantization of 
LPC vectors has been applied to speech-recognition systems both in 
direct applications®® and in conjunction with work on the application 
of hidden Markov models (HMMs) to recognition.”*® 

The main idea of vector quantization is summarized as follows: 
assume that a training set {7} of J LPC vectors is given. It is desired 
to find a code book of M* LPC vectors such that the average distance 
of a vector in {7} from the closest code book entry is minimized. Thus 
we wish to find a set {R} of reference vectors that minimizes the 


* Bell Laboratories. 
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average distance D;(M*) given by 


I 
D,;(M*) = min F ymin (47.2) ‘ (1) 
{R} I i=1 1<m<M* 
where d(T;,R,,) is the LPC distance between training vector T; and 

code book entry Rp. 

The optimum code book is generated by a method similar to the K- 
means algorithm. Starting with an initial guess of M* entries, each 
vector of the training set is assigned to the closest entry. The centroids 
of the M* subsets (clusters) obtained in this manner are used as new 
trial entries in the code book, and the iteration is continued until some 
stopping criterion is satisfied. 

For large M*, the choice of initial guesses can be quite important, 
and it is unlikely that a randomly chosen initial guess is a good one. 
For this reason the splitting algorithm was devised in Ref. 1. In this 
algorithm a code book of M = 2 entries is optimized, as described 
above, starting with a random initial guess. Next, each optimum code 
book entry for M = 2 is split into 2 and used as an initial guess for a 
code book of size 2M. This process is used until M = M*. To 
distinguish this algorithm from others considered later, we call it the 
binary-split algorithm. 

To the best of our knowledge, all speech-related applications of 
vector quantization so far have used this binary-split algorithm. How- 
ever, a priori, the requirement that every code word be split appears 
to be too restrictive. For example, after optimizing an M = 2 code 
book, if one cluster contains almost all the training set and the other 
contains just a few elements, it might be argued that only the larger 
cluster should be split. Thus it is of interest to consider “single-split” 
algorithms in which a single cluster is split at each iteration. 

For very large M* (e.g., 1024 or 2048) single-split algorithms might 
require prohibitive amounts of computation. However, M* on the 
order of 64 or 128 can be quite useful in certain applications.® In these 
cases a single-split algorithm is quite feasible. In any case, it is of 
interest to know whether or not a single-split algorithm yields a better 
code book than the binary-split algorithm. 

There are at least three reasonable ways of implementing the 
splitting rule of a single-split algorithm for training the vector quan- 
tizer. To describe these three splitting rules we need some definitions. 
Let 


{T'(m)} = The set of training vectors represented by the mth code 
book entry (cluster) in a size M vector quantizer 


Cu(m) = The number of training vectors in T'(m) 
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dy(m) = The average distance (distortion) of the Cy(m) vectors 
from the mth code-book entry 


Dy(m) = The total distance (distortion) of the Cy(m) vectors. 
We then have the relationships 


M 
I= >» Cy(m) (2) 
Culm) 
dy(m) = Cam 2 d(Ty(m)q,Rm) (3) 
Dy(m) = Cyu(m)-du(m). (4) 


Using eqs. (2) through (4) we can write the average distortion of eq. 
(1) as 


M 
> Dy(m) 
D;(m) = min a4 (5a) 
|S Culm) 
M 
» dy(m)Cy(m) 
=min | = | (5b) 


{R} 


M 
yi Cu(m) 


Based on the above definitions, the three splitting rules we have 
considered are: 

Rule 1: Split the cluster, m, with the largest number of vectors, 
Cu(m). We denote the resulting (vector quantizer) VQ 
code-word set as R,. 

Rule 2: Split the cluster, m, with the largest average distortion, 
dy(m). We denote the resulting VQ code-word set as Ry. 

Rule 3: Split the cluster, m, with the largest total distortion, 
Dy(m). We denote the resulting VQ code-word set as Rp. 

The key issue is how do the different splitting rules affect the prop- 
erties of the resulting vector quantizer—in particular the average 
distortion [eq. (1)] and the coverage of the LPC space. 

We have run a series of experimental evaluations of the single-split 
and binary-split algorithms for training the VQ. We have found that 
each of the different splitting criteria leads to a different reference 
prototype set (VQ code book); however, all the VQ sets had essentially 
the same average distortion. We were also able to show that the 
coverage of the LPC space for all VQ sets was identical, and that the 
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average distance of any one VQ set from another VQ set was smaller 
than the average distortion of the training set. Hence, the different 
implementations of the training algorithm for the VQ lead to equiva- 
lent VQ reference sets. Thus for any practical application the simple 
binary-split algorithm is effective for deriving the VQ code book 
entries. 

The outline of this paper is as follows. In Section II we review the 
Linde et al.’ implementation of the binary-split VQ training algorithm 
and show how we modified it to handle the single-split case. In Section 
III we discuss the results of several experiments on testing the different 
implementations of the training algorithm. In Section IV we provide 
a discussion and summary of the results. 


Ii. IMPLEMENTATION OF THE VQ TRAINING ALGORITHM 


The implementation of the VQ training algorithm is essentially the 
one proposed by Linde et al.' A flow diagram of this procedure for the 
binary-split case is given in Fig. 1a and for the single-split case in Fig. 
1b. Given M code words, each vector of the training set T is assigned 
to the code word closest to it. The average distortion D;(M) is 
computed for this assignment of the J training vectors to M clusters. 
M new code words are obtained as centroids (i.e., averaged normalized 
autocorrelations) of each cluster, and the distortion D;(M) computed 
again. This process is iterated until it converges, i.e., until the percent 
change in distortion is less than a preset value e« (chosen to be 1 
percent in our simulations). Once convergence is achieved, M is 
doubled by splitting each code word into two. The entire process is 
repeated until M = M*. The iteration is initialized by choosing two 
arbitrary code words. 

In our implementation, we made one modification to the VQ training 
algorithm of Fig. 1. We inserted a check after the classification of the 
training set vectors to see if any cluster is empty (i.e., contains none 
of the training set vectors). In such a case the “largest” cluster is split 
into two clusters, and the convergence test is bypassed (to ensure a 
reclassification in which each cluster is nonempty). However, for the 
data used in this experiment, an empty cluster never occurred. In 
subsequent tests with larger M* we did encounter such cases. 

For the single-split algorithm (Fig. 1b), only one modification is 
required. After convergence, only the “largest cluster” is split. Here 
largest can refer to the cluster with the largest average distortion, total 
distortion, or count. 

For a convergence criterion of « = 1 percent, typically it takes three 
to six iterations of the classification loop to obtain a convergent set of 
clusters and centroids. We also found that the algorithms of Fig. la 
and 1b work extremely reliably over a broad range of types of training 
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(a) (b) 
Fig. 1—Flow charts of the vector quantizer training algorithms. (a) The binary-split algorithm. (b) The single-split algorithm. 


data (e.g., collected from a single talker, collected from many talkers, 
collected from a corpus of isolated words, collected from sentence- 
length material, etc.). 


Ili. COMPARISON OF THE BINARY- AND SINGLE-SPLIT ALGORITHMS 


To compare the performances of the binary- and single-split VQ 
training algorithms of Fig. 1, several tests were run. The database 
consisted of a set of 39,708 LPC vectors. The LPC analysis used a 
6.67-kHz sampling rate and an eigth-order analysis of 300 sample (45 
ms) frames of speech. The sample frames had been preemphasized 
with a simple, first-order digital network (preemphasis factor of 0.95) 
and windowed by a 300-sample Hamming window. Frames were taken 
100 samples apart across the duration of each word of a series of 1000 
isolated words (digits) spoken by 100 talkers (50 male, 50 female). All 
recordings were made over dialed-up telephone lines through a local 
PBX connection. All silence outside the spoken words was eliminated 
by a word endpoint detector;? hence, all LPC training frames were 
from within word boundaries. 

Several aspects of the binary- and single-split training algorithms 
were studied. The first question considered was whether the two 
training procedures yielded identical results (i.e., whether the resulting 
LPC code words and the clusters from which they were derived were 
identical). Figure 2 shows plots of the cluster splitting for an M* = 8 
solution for the binary-split algorithm (Fig. 2a) and the single-split 
algorithm based on average distance splitting (Fig. 2b). It can be seen 
that the resulting eight clusters in the single-split case come from very 
different splits than those for the binary-split case. For example, in 





(a) (b) 


Fig. 2—Splitting charts for an M* = 8 vector quantizer with splits based on average 
distortion. (a) The binary-split training algorithm. (b) The single-split algorithm. 
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the single-split case, final clusters 6 and 7 come from four splits of the 
original cluster 2, whereas final clusters 1 and 2 come from single 
splits of original clusters 1 and 2. In the binary-split case all final 
clusters come from two splits of original clusters 1 and 2. Similarly, 
the actual clusters were grossly different for the three different criteria 
for the single-split algorithm. 

The next question we considered was how the different training 
procedures differed in performance. Figures 3 through 5 show a series 
of plots of statistics comparing some of the details of the individual 
training procedures. For each of these plots, Parts (a) through (d) 
show results for the binary-split case, the single-split case based on 
count, the single-split case based on average distortion, and the single- 


COUNT RATIO (MAXIMUM/MINIMUM) 


SIZE OF VECTOR QUANTIZER (M*) 


Fig. 3—Plots of count ratio (maximum cluster count divided by minimum cluster 
count) as a function of the size of the vector quantizer. (a) Binary-split training. (b) 
Single-split training based on count. (c) Single-split training based on average distortion. 
(d) Single-split training based on total distortion. 
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Fig.4—Plots of average distortion ratio as a function of the size of the vector 
quantizer. (a) Binary-split training. (b) Single-split training based on count. (c) Single- 
split training based on average distortion. (d) Single-split training based on total 
distortion. 


split case based on total distortion. The statistics plotted are ratio of 
maximum to minimum cluster count (Fig. 3), ratio of maximum to 
minimum average distortion (Fig. 4), and ratio of maximum to mini- 
mum total distortion (Fig. 5) versus size of the vector quantizer. These 
statistics were chosen because each of them should ideally approach 
1.0 for clusters that are of equal size according to the corresponding 
splitting criterion. For example, we would expect the count ratio to 
approach 1.0 for the split on count criterion but not necessarily for 
the other splitting criteria. 

Examination of Figs. 3 through 5 shows several interesting things. 
As seen in Fig. 3, the count ratio for the binary-split case for M* = 64 
(4.1) is actually smaller than the count ratio for the single split on 
count case for M* = 64 (4.8). The count ratios for the other two split 
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Fig. 5—Plots of total-distortion ratio as a function of the size of the vector quantizer. 
(a) Binary-split training. (b) Single-split training based on count. (c) Single-split 
training based on average distortion. (d) Single-split training based on total distortion. 


criteria are indeed larger than for the split on count, as expected. 
Figure 4 shows that the average-distortion ratio is smallest (4.1) at 
M* = 64 for the single split on average-distortion case; however, the 
distortion ratios for the binary case (4.4) and the single split on total- 
distortion (4.7) cases are only slightly larger. Finally, Fig. 5 shows a 
similar set of results on the total-distortion-ratio statistic in which 
the results for M* = 64 for the binary-split case (2.7) are only slightly 
worse than for the single split on total-distortion case (2.6). 

The results of Figs. 3 through 5 indicate that the binary-split case 
seems to yield cluster training statistics that are almost as good as the 
best statistics for any of the single-split cases in terms of count ratio, 
average-distortion ratio, and total-distortion ratio. Hence, from the 
point of view of cluster statistics, the binary-split cases appear to give 
the best overall performance. 
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Two gross performance checks were made on the training algo- 
rithms. In the first test, the average distance between vector quantizer 
sets obtained from the different training procedures was calculated as 
a function of M*. The results of this test are given in Table I. It can 


Table I—Average distance between code book entries of vector 
quantizers designed on the basis of count (R,), average distortion 
(Ra), total distortion (Rp), and binary splitting (Rs) 


M* d(R., Ra)  d(R, Rp) (RRs) (Ra Ro) d;(M*)t 
4 0.384 0.019 0.047 0.270 0.707 
8 0.125 0.138 0.157 0.101 0.426 

16 0.148 0.143 0.160 0.065 0.326 

32 0.191 0.108 0.175 0.132 0.255 

64 0.216 0.131 0.148 0.131 0.203 


+ Average distance between the training vectors and the code words representing 
them. 
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Fig. 6—Plot of average training set distortion D,(M*) as a function of the size of the 
vector quantizer. 
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be seen that the average distance between vector quantizer sets is as 
small or smaller than the average distance of the training vectors to 
the code book sets. Hence, the code book sets derived from the different 
training algorithms are, on average, quite close to each other. 

The second test we performed was to measure the average distortion, 
D,(M*) versus M* for the different training algorithms for values of 
M* from 2 to 64. The results of this test are plotted in Fig. 6. On the 


M* = 64 VECTOR QUANTIZER 
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Fig. 7—Plots of code-word coverage in the F\-F2, Fi-F3, and F2-F3 planes for an M* 
= 64 vector quantizer. 
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scale of this plot, the differences in average distortion are indistin- 
guishable among the different vector quantizers. 

The third and final question we considered concerns the coverage 
of the space of speech sounds by the optimum code books. A good way 
of displaying this coverage is to look at the code books in the space of 
formant frequencies. The formant frequencies (and bandwidths) for 
each entry of the code book are given by the zeroes of the trigonometric 
polynomial associated with it. Thus each code book may be displayed 


M* = 1024 VECTOR QUANTIZER 
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Fig. 8—Plots of code-word coverage in the F,-F», F,-F3, and F,-F3 planes for an M* 
= 1024 vector quantizer. 
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as a scatter plot in F,-F>-F3 space. Projections of this scatter diagram 
on the F,-Fo, F\-F'3, and F2-F3 planes are shown for a typical code book 
in Figs. 7 and 8 for the code books obtained from the binary-split 
training algorithm for M* = 64 (Fig. 7) and M* = 1024 (Fig. 8). It is 
seen that the code words cover the expected regions in the formant 
frequency planes fairly uniformly. The major difference between the 
coverage of the M* = 1024 and the M* = 64 code books is the density 
of coverage of the areas in the respective formant frequency planes. 
The coverage of the single-split algorithms for M* = 64 was essentially 
identical to that of the binary-split algorithm. 


IV. DISCUSSION 


Our overall conclusion from the tests that compared the fine and 
gross differences in clustering LPC vectors via a VQ training algorithm 
is that all the variations in the training procedure that we studied (i.e., 
different splitting procedures, different convergence criteria, etc.) lead 
to essentially indistinguishable differences in the set of VQ code book 
entries. Since the binary-split algorithm, as discussed by Linde et al.! 
requires the least amount of computation, it is the best of the algo- 
rithms considered. 

In this paper we present the results of a series of experiments on a 
training set of 39,708 vectors. More recently we have experimented 
with the binary-split VQ training procedure on a number of different 
training sets whose size varied from 10,000 to 600,000 vectors. We 
found that the training procedure always rapidly and reliably con- 
verged to a set of code book vectors whose properties were similar to 
those described in this paper. We are currently using the VQ code 
book sets in work related to speech recognition and speech coding. 


V. ACKNOWLEDGMENTS 


The authors gratefully acknowledge several fruitful discussions with 
Fred Juang of Bell Laboratories concerning the characteristic prop- 
erties of the VQ clusters. Juang’s insight into the training procedure 
and the resulting properties of the code book vectors coincided with 
the results we found in this study. 


REFERENCES 


1. Y. Linde, A. Buzo, and R. M. Gray, “An Algorithm for Vector Quantization,” IEEE 
Trans. Commun., COM-28, No. 1 (January 1980), pp. 84-95. 

2. B. Juang, D. Wong, and A. H. Gray, Jr., “Distortion Performance of Vector 
Quantization for LPC Voice Coding,” IEEE Trans. on Acoust., Speech, and Signal 
Proc., ASSP-30, No. 2 (April 1982), pp. 294-303. 

3. A. Buzo, et al., “Speech Coding Based Upon Vector Quantization,” IEEE Trans. on 
Acoust., Speech, and Signal Proc., ASSP-28, No. 5 (October 1980), pp. 562-74. 

4. D. Wong, ’B. Juang, and A. H. Gray, Jr., “An 800 Bit/S Vector Quantization LPC 
Vocoder,” IEEE Trans. on Acoust., Speech, and Signal Proc., ASSP-30, No. 5 
(October 1982), pp. 770-80. 


VECTOR QUANTIZATION = 2615 


. A. Buzo, H. Martinez, and C. Rivera, “Discrete Utterance Recognition Based Upon 
Source Coding Techniques,” Proc. ICASSP-82 (May 1982), pp. 539-42. 

. J. E. Shore and D. Burton, “Discrete Utterance Speech Recognition Without Time 
Normalization,” Proc. ICASSP-82 (May 1982), pp. 907-10. 

. R. Billi, “Vector Quantization and Markov Source Models Applied to Speech 
Recognition,” Proc. ICASSP-82 (May 1982), pp. 574-7. 

. L. R. Rabiner, S. E. Levinson, and M. M. Sondhi, “On the Application of Vector 
Quantization and Hidden Markov Models to Speaker Independent, Isolated Word 
Recognition,” B.S.T.J., 62, No. 4 (April 1983), pp. 1075-105. 

. L. F. Lamel, et al. “An Improved Endpoint Detector for Isolated Word Recognition,” 
IEEE Trans. on Acoust., Speech, and Signal Proc., ASSP-29, No. 4 (August 
1981), pp. 777-85. 


on OD OH 


ito) 


AUTHORS 


Stephen E. Levinson, B. A. (Engineering Sciences), 1966, Harvard; M.S. 
and Ph.D. (Electrical Engineering), University of Rhode Island, Kingston, 
1972 and 1974, respectively; General Dynamics, 1966-1969; Yale University, 
1974-1976; Bell Laboratories, 1976—. From 1966 to 1969, Mr. Levinson was 
a design engineer at Electric Boat Division of General Dynamics in Groton, 
Connecticut. From 1974 to 1976, he held a J. Willard Gibbs Instructorship in 
Computer Science at Yale University. In 1976, he joined the technical staff at 
Bell Laboratories, where he is pursuing research in the areas of speech 
recognition and cybernetics. Member, Association for Computing Machinery; 
Fellow, Acoustical Society of America; Senior Member, IEEH, editorial board 
of Speech Technology; Associate Editor, IEEE Transactions on Acoustics, 
Speech and Signal Processing. 


Lawrence R. Rabiner, S.B. and S. M., 1964, Ph.D. (Electrical Engineering), 
The Massachusetts Institute of Technology: Bell Laboratories, 1962—. From 
1962 through 1964, Mr. Rabiner participated in the cooperative plan in 
electrical engineering at Bell Laboratories. He worked on digital circuitry, 
military communications problems, and problems in binaural hearing. Pres- 
ently, he is engaged in research on speech communications and digital signal 
processing techniques. He is coauthor of Theory and Application of Digital 
Signal Processing (Prentice-Hall, 1975), Digital Processing of Speech Signals 
(Prentice-Hall, 1978), and Multirate Digital Signal Processing (Prentice-Hall, 
1983). Former President, IEEE, ASSP Society; former Associate Editor, ASSP 
Transactions; former member, Technical Committee on Speech Communica- 
tion of the Acoustical Society, ASSP Technical Committee on Speech Com- 
munication; Member, IEEE Proceedings Editoral Board, Eta Kappa Nu, 
Sigma Xi, Tau Beta Pi. Fellow, Acoustical Society of America, IEEE. 


Man Mohan Sondi, B.Sc. (Physics), Honours degree, 1950, Delhi University, 
Delhi, India; D.I.I.Sc. (Communications Engineering), 1953, Indian Institute 
of Science, Bangalore, India; M.S., 1955; Ph.D. (Electrical Engineering), 1957, 
University of Wisconsin, Madison, Wisconsin; Bell Laboratories, 1962—. 
Before joining Bell Laboratories, Mr. Sondhi worked for a year at the Central 
Electronics Engineering Research Institute, Pilani, India and taught for a year 
at the University of Toronto. At Bell Laboratories his research has included 
work on speech signal processing, echo cancellation, adaptive filtering, mod- 
elling of auditory and visual processes, and acoustical inverse problems. From 
1971 to 1972 Mr. Sondhi was a guest scientist at the Royal Institute of 
Technology, Stockholm, Sweden. 


2616 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1983 


THE BELL SYSTEM TECHNICAL JOURNAL 
Vol. 62, No. 8, October 1983 
Printed in U.S.A. 


Upper Bounds on the Minimum Distance of 
Trellis Codes 
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A trellis code is a “sliding window” method of encoding a binary data stream 
into a sequence of real numbers that are input to a noisy transmission channel. 
When a trellis code is used to encode data at the rate of k bits/channel symbol, 
each channel input will depend not only on the most recent block of k data 
bits to enter the encoder but will also depend on, say, the v bits preceding this 
block. The v bits determine the state of the encoder and the most recent block 
of k bits generates the channel symbol conditional on the encoder state. The 
performance of trellis codes, like that of block codes, depends on a suitably 
defined minimum-distance property of the code. In this paper we obtain upper 
bounds on this minimum distance that are simple functions of k and v. These 
results also provide a lower bound on the number of states required to achieve 
a specific coding gain. 


I. INTRODUCTION 


In this paper we are concerned with transmission of digital data 
using trellis codes to gain some noise immunity over standard uncoded 
methods. We assume pulse amplitude modulation whereby the values 
of the transmitted data are estimated from a sequence of samples r’ 
generated by a receiver. These output samples are often modeled as 


r=x+n, (1) 
where x’ is a real number sequence determined by the source sequence 
of binary data and n’ is an independent zero-mean white Gaussian 
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noise sequence of variance o”. For uncoded transmission at rate k bits/ 
symbol, x/ takes on one of 2° fixed values. Error performance may be 
improved using coding, but if we insist on transmitting at rate k bits/ 
symbol then we must increase the number of possible values taken by 
the x’. We can choose either a block or tree (trellis) structure for the 
code. In this paper we consider only trellis codes. The performance of 
trellis codes, like that of block codes, depends on a suitably defined 
minimum-distance property of the code. We obtain upper bounds on 
this minimum distance, dpin. The analogous problem for block codes 
is well studied, but little work has been done on distance properties of 
trellis codes.'” 

We assume the following model for encoding the binary data (i.e., 
choosing the x’) prior to transmission over the Gaussian channel. 
Regard the incoming binary digits as partitioned into blocks of k 
consecutive bits. The real number x’ is to be a time-independent 
function of the most recent k-bit block and also of the pv bits preceding 
this block. Thus if {a;} is the binary data sequence, we assume 


x= X(Ajky Ajk-1) *** » Ajk—(k-1); AYj-1)ky °° Q, j-1)k—-(—-1))- (2) 


This is an example of a k-bit/symbol trellis code. We regard the v 
“old” bits as determining the state of the encoder (there are 2” possible 
states) and the k “new” bits as generating the channel symbol (there 
are 2 possible symbols) conditional on the encoder state. The trellis 
structure is made evident by drawing an example. Fig. 1 shows the 
casek=1,v= 2. 

If, in this example, the encoder is in state (00) at time j, and the 
next bit (block of k = 1 bits) to be transmitted is a 1, then we transmit 
the symbol x(100) and move to state (10). 

Other trellis codes exist. For example, we could define a code with 
just three trellis states or the symbols x’ could also depend on the time 
index j. However, we shall only consider trellis codes determined by 
(2). The trellis structure of (2) is identical to that of linear algebraic 


00 





y x(111) x11) 


STATE ; i+ j+2 
LABELS 


Fig. 1—Diagram of a trellis code. 
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convolutional codes. We use the term sliding window trellis codes for 
trellis codes determined by (2). 

To simplify the discussion in the text we shall assume that k divides 
v. The general case is treated in Appendix B. 

The problem we consider involves certain distance properties of 
trellis codes. To motivate it, consider the decoding problem. Optimum 
decoding involves finding the most likely path through the trellis, 
given the observed sequence (1).? Typically, the path chosen will not 
coincide with the correct path for all time but will occasionally diverge 
from it and remerge at a later time. This is called an error event, and 
we generically denote it by the letter E. For example, with the trellis 
in Fig. 1, x(000) may have been sent several times in succession, 
resulting in the straight path shown in Fig. 2, but noise may have 
caused the decoder to choose an alternate path. In Fig. 2 the decoder 
chose the symbols x(100), x(010), x(001) instead of x(000), x(000), 
x(000). 

An error event E of length L lasts from time i to time i + L, the 


decoder having decided upon the symbol sequence <'*!, ..., <*% 
instead of the correct sequence x"*!, --- , x't”. The (squared) Euclidean 
distance d? (= d?(E)) between the two paths of E is given by 
+L 
P= 3 G=7)? (3) 
joi41 


and is crucial to determining the probability P(E) of an error event 
E. With the white noise assumption made in (1), P(E) is easy to 
calculate and, when d? > o”, it is approximately given by 


2 
P(E) = exp (- 2) (4) 


Equation (4) leads us to expect that, for small noise, symbol error 
probabilities will be determined by error events having the smallest 
minimum distance between their two paths and it becomes of interest 
to design codes that have good minimum-distance properties in this 
sense. Such designs have recently been considered by Ungerboeck, 
who obtained on the order of 3-dB performance improvements (factor 





11.0 oO Oo O oO 
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Fig. 2—Example of an error event. 
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of 2 in minimum distance) over the uncoded case for k = 1, 2, and four 
or eight states in the trellis.* 

Ungerboeck based his designs on a computer search of binary 
convolutional codes with 2” states, rate k/(k + 1), and a particular 
mapping of the output binary (k + 1) tuples to 2"*? equally spaced 
channel symbols (+1, +3, etc.). His use of convolutional codes thus 
conforms to the general scheme of (2), which implies the same trellis 
structure as described herein. However, his a priori choice of only 2**1 
equally spaced channel symbols is certainly restrictive in principle. In 
this paper we consider the natural question of how large d2;,/P can 
be made if these restrictions are removed. Here, dnpin is the minimum 
distance between all pairs of paths associated with error events in the 
trellis, and P is the average transmitted power. 

Section II gives a detailed description of the trellis structure and of 
error events. If S is a finite set of error events, then 


1 
— d*(E), 5 
is] ts? °) 
since the minimum of a set of real numbers is bounded above by their 
average. This observation is the basis of our first two bounds. The 
first and simplest bound is 


min {d2(E)} < 
EES 








dein v 
Pp <4 ( + “) (6) 
which is obtained in Section III. A more detailed analysis in Section 
IV gives 
doin gkt1 y 
<= - 
Po Fei ( " ) (7) 


which is stronger than (6) provided k > 1. Let T be another finite set 
of error events and let ri, re = 0 be real numbers satisfying r; + re = 
1. Then, 


1 1 
min {d(E)}} <n (a >, r(e)) + re 5 » «r(e)), (8) 
EESUT [S| ces |T| ger 
since the minimum of a set of real numbers is bounded above by any 
weighted average of those numbers. In Section V, by choosing S, T, 
ry, and re, appropriately, we prove 


ad. g2k+1 


This bound is stronger than (7) provided v > k(2” — 1). Combining (7) 
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and (9) we have 


7 ae gk+1 y g2k+1 y 
ee I | aie 1 
P < min |= (1 +2), (2 +2)| (10) 


Extensions of bounds (6), (7), and (9) to the case when k does not 
divide v are given in Appendix B. 


il. A GROUP ACTION ON THE TRELLIS 


In later sections we obtain upper bounds on d2,,,/P by considering 
sets of error events that are fixed by a group of symmetries of the 
trellis. In this section we describe the group. 

We consider trellis codes with 2” states transmitting k bits/channel 
symbol and for simplicity we assume that k divides v. States are 
labelled with binary vy tuples, and edges of the trellis are labelled with 
binary v + k tuples. We identify the binary r tuple (bo, --- , 6,-1) with 
the integer 


bp 2° + O,22+--- +6,12"1. 


The states are labelled with binary v tuples 00--- 0, 10--- 0, 
010 --- 0, 110--- 0, ---, 11--- 1, in increasing order, from top to 
bottom as in Fig. 1. The edges are labelled with binary v + k tuples 
Xo = x(0 --- 0), x1 = x(10 --- 0), x2 = x(010 --- 0), x3 = x(110--- 0), 
+++, Xortk_y = x(11 --- 1), also in increasing order, from top to bottom 
as in Fig. 1. Set N = (k + v)/k. If we write an edge label as x(so, ---, 
Sy-1), then it will be understood that each s; is a binary k tuple. A “+” 
appearing in the argument of a label means bit-by-bit modulo 2 
addition. A similar notation will be used for states. 

We define a group of symmetries of the trellis. These symmetries 
will map error events of length L to error events of length L. For each 
binary v + k tuple t, we define a permutation g, of the edge labels x(s) 
by the rule 


&(x(s)) = x(s + t). (11) 


For example, when k = 1, v = 2, and t = (010), 


x(000) x(010) 
x(100) x(110) 
x(010) x(000) 
x(110) x(100) 
x(101) x(111) 
x(011) x(001) 
x(111) x(101) 
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This may also be written 


Zoio(x) = Tx, (13) 
where JT is the permutation matrix 
001 0 
0001 
100 0 
_| 0 10 0 
T= 0010; (14) 
0001 
100 0 
010 0 


If G,, = {g;|t is a binary v + k tuple}, then G,, is an abelian group of 
order 2**’, and every element g; of G;,, satisfies g? = e, where e is the 
group identity. 
Lemma 1: Any pair of edge labels is interchanged by a unique group 
element. 
Proof: Edge labels x(s) and x(u) are interchanged only by g.4,. O 
We call the time sections (0, 1), (1, 2), --- the components of the 
trellis. We shall now show how to choose binary vp + k tuples t = ¢°, t}, 
--+ so that if g,: is applied to the edges in component 1, then an error 
event of length L is always mapped to another error event of length 
L. It is, in general, necessary to choose a different g, for each compo- 
nent since if we simply apply the same permutation g, to the edges in 
every component, then an error event E need not be transformed to 
another error event. Thus, if go1o is applied to each component of the 
error event shown in Fig. 2, then we obtain the edges shown in Fig. 3. 
The permutation go19 transforms the edge labelled x(uvw), joining state 
vw and state uv, into the edge labelled x(u(1 + v)w), joining state 
(1 + v)w and state u(1 + v). If t = t° = 010, then go10 permutes the 
encoder states at time 0 by the rule 


vw +> (1 + v)w, (15) 
and permutes the encoder states at time 1 by the rule 
uv + u(1 + v). (16) 


x(000) 





Fig. 3—Permutation go10 applied to all edges of an error event. 
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Similarly, the permutation g,;: permutes encoder states at time 1 and 
encoder states at time 2. If we want to map error events to error 
events, then the action of gz on encoder states at time 1 must be given 
by (16). Choose t! = 001, t? = 100, ¢? = 010, t* = 001, --- . The action 
of g,0, 2:1, and g;2 on components 0, 1, and 2 is shown in Fig. 4. Thus 
the sequence (g;, 2:1, 2:2, ---) transforms the error event shown in 
Fig. 2 to the error event shown in Fig. 5. 

For general k and », let t = t® = (to, --- , ty-1) where N = (k + v)/k 
and to, ---, tn-1 are binary k tuples. Let t! = (ty-1, to, ---, tw-2) be 
the vector obtained from t° by cycling the blocks of k bits to the right 
and moving the last block, ty_;, to the front. Repeat this operation i 
times to obtain t' = (ty_;, ---, tn-1, to, ---, tn-i-1). For i = N we view 
i as an integer modulo N. Thus t® = ¢® = ¢, t+! = t!, .-.. The action 
of gon encoder states at time i coincides with that of g,i-1 being given 
by the rule 


S > (ty-it1, «++, tn-1, to, «++ tn-i-a) + S. (17) 


If Gi, = {(g1o, 2, ---)|t° is a binary k tuple}, then Gi, is a group of 
2’t® symmetries of the trellis. The group Gi, is abelian, and every 
element has order 2. We denote (g,0, g1, ---) by gj, since it is 
determined by ¢°. 


Lemma 2: If i = 0 and if x(s), x(t) are any pair of edge labels in 
component 1, then there is a unique element of Gj, that interchanges 
x(s) and x(t). 
Proof: This follows from Lemma 1, since the restriction of Gz, to the 
edges in component 1 is just G,,,. O 

A set S of error events is said to be fixed by Gi, if for all g € G3, 
and all FE € S we have g(E) ES. 


t=0 t=1 t=1 t=2 t=2 3 
00 a O 4 
10 x({uvw) x(uvw) O O- xtuvw) 
1 
P ( x(u(1+v)w) ian” oO ace 
11 Oo O 
COMPONENT 0 COMPONENT 1 COMPONENT 2 


Fig. 4—Action of g,0, g,1, and g,2. 





Fig. 5—The symmetry ( £29, 2,1, £:2, ---) applied to an error event. 
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Lemma 3: Let i= 0 and let S be a set of error events of the same length 
that is fixed by Gi,. If m;(x(a)) is the total number of times the edge 
label x(a) occurs in component i of the error events of S, then 

2|S| 


gkty 





m;(x(a)) = for all vy + Rk tuples a. 
Proof: Let s, t be binary v + k tuples. By Lemma 2 there is an element 
of GZ, interchanging error events involving x(s) in component i with 
error events involving x(t) in component 1. Hence m;(x(s)) = m;(x(t)). 
Since the total number of edges in component i is 2|S|, we have 
mj(x(a)) = 2|.S |/2"*? for all » + k tuples a. 

An orbit S of the group Gi, is a set of error events satisfying 

1. iff E€ S andg € Gj, then g(E) € S, and 

2. if H,, E. € S then there exists g € Gz, such that g(E£,) = Ep. 
Fig. 6 shows an orbit of Gf. Observe that m;(x(a)) = 1 for all 1 and 
for all a. 


lil. THE FIRST BOUND 
In this section we derive the upper bound 


doin v 
Pp <4 ( + 2) 


This bound will be strengthened in later sections but it seems worth 
presenting the simpler argument here. 

Observe that the average transmitted signal power is simply the 
average of the transmitted channel symbols, namely 


1 grtk_y 
P= grrr > x?, (18) 
1=0 





oO 10) Oo 





(9901 9100-9010) (F) 
(9914 +9101+9119) (E) 


Fig. 6—An orbit of Gfo. 
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(Recall that the channel symbol x(ap --- @,+x-1) is also denoted x; 
where i = dp + ay2? + +++ + Gyep-12’** 1.) The Euclidean distance 
between the paths of the error event F shown in Fig. 2 is 


d?(E) = (x9 — x1)? + (xo — x3)” + (xo — x5)”, 
which is a quadratic form in the variables x;. In general we define 
ad = (xo, X1, ***, Xortk_y), (19) 


where the superscript T7' denotes matrix transpose. Then the Euclidean 
distance d?(E) between the paths of an error event E is given by 


d*(E) = x"A(E)x, (20) 


where A(E) is a symmetric, positive semi-definite matrix which we 
call the distance matrix of E. The distance matrix A(E£) has two 
properties that we wish to note: 
Property I. The ith diagonal element of A(E) counts the number 
of times the symbol x; occurs in the error event. 
Property II. The rows of A(E) sum to zero. 
By (18) and (20), 
2 T 
Cmxin _ min = BEE one mi 


xTA(E)x 
EAE ces ae i 
P - 2? es liar 2 (21) 


where we minimize over all error events E. 

Although we will make no use of the fact in this work, we note that 
in (21) only a finite number of error events need be considered, for no 
error event need be considered that has a repeated pair of states. Thus, 
if the pair of states u and w occur at time i and also at a later time j, 
all components between i and j may be eliminated and the remainder 
of the error event after time j may be placed after time i. Since 
components cannot make a negative contribution to d?(E) the new 
error event has distance no greater than the original one. By (21) the 
best normalized minimum distance that can be achieved for any choice 
of channel symbols is 


T 
Q’t® max min cae (22) 
x oy 


Consider an error event £ with initial state (time t = 0) a = 
(a1, --+, @y-1) and final state z = (z, ---, zn-1). If k tuples b,, df 
are input at time 0, then at time 1 the two paths occupy states 
(b1, 1, -++, @n—-2) and (b¥, ay, ---, a@n—2). There must be at least N — 
1 further inputs before the paths can remerge. To remerge at z, the k 
tuples zy-1, Zv-2, --:, 2, must be input in that order to both paths. 
We denote this error event by E(a, z; b;, bf). Thus, the minimal length 
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of an error event is N = (k + v)/k. Fig. 2 shows the error event 
E(00, 00; 0, 1) which has minimal length 3. 
Given an arbitrary set S of error events, define 


Q(S) = da A(E). (23) 


iss 


Let S* be the set of all error events of length N. Note that S” is fixed 
by the group Gj,. 

Theorem 1: If k divides v then the normalized minimum distance of any 
sliding window trellis code with 2” states and rate k bits/channel symbol 


satisfies 
doin v 
——< |}. 
a <4(1 +2) 


Proof: By (22), 


d?.; xTA(E)x 
Emin — ovtk 
P 2 nee min aoe 
xTA(E)x 


< 2’** max min ——7 
x EesN x x 


Bai > A(e)) x 
gee EesN 


max 
<TS*] xT x 


The last inequality simply states that the minimum is not more than 
the average. Setting Ay = Mzcsn A(E), we have 


gutk xTANX _ grtk 

= = oN A , 

Eales ee hl eens 
where ),(Ayn) denotes the largest eigenvalue of Ay. By Property I, the 
ith diagonal entry of Ay counts the total number of times the edge x; 
appears in some component of the error events of length N. By Lemma 
3 all diagonal entries are equal to 2N|S% |/2’**. Property II implies 
that all row sums of Ay are zero. By the Gersgorin Circle Theorem? 
2N | —) 


gutk 


\1(An) S 2(diagonal entry) = 2 ( 


and so 


d2in = v 
tee <av=4(1+2) O 
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Remarks: In Section IV we derive a formula for Q(S%), and, by 
computing \,(Ay), we prove 


dinin — gk+1 (a2 
P 2-4 kJ’ 





In Appendix B we prove that if vy = (N — 1)k + l, where 0 <1 <k, 


then 
d2ain v 
Pp <4 ( + =), 


where L yJ denotes the integer part of y. 


IV. A FORMULA FOR Q(S”%) AND A SHARPER BOUND 


In this section we derive a formula for Q(S”), the matrix obtained 
by averaging the distance matrices of all error events of minimal 
length N = (k + v)/k. We require a matrix representation of the group 
Gi». 

If A is an m X n matrix and B is an m, X n,; matrix, then the tensor 
product A ® B (also called the Kronecker product) is the mm, X nn, 
matrix 


A,B ayB «+e. anB 

QB adaB «+... Qo,B 
A®B= 

Gab G26B ent. AmnB 


Tensor products are discussed in Ref. 5, where they are called direct 
products. For appropriately sized matrices, A, B, C, and D, we have 
(A ®B)(C® D) = (AC) ® (BD). If d is an eigenvalue of A with 
associated eigenvector v, and wu is an eigenvalue of B with associated 
eigenvector w, then Ay is an eigenvalue of A ® B with eigenvector 
v® w. 

We denote the n X n identity matrix by J, and we abbreviate Iz to 


I, Set 
0 1 
A= k | (24) 
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Consider the 2’t* x 2”** matrix 


P;= 1®---@I @AQBI®@.--- Ol 
— 
J terms . terms 


= 15,58 A ® oi, (25) 
where i+j+1=v+R. This is the matrix 





with the indicated block repeated 2/ times along the main diagonal. 


Define u;, i = 0, 1, ---, 2’** — 1, to be the binary v + k tuple with a 1 
in position z and 0’s elsewhere. Let 
x = (Xo, «++, tones)? = (x(0 ++ 0), «++, x 


The permutation g,, maps x(s) to x(u; + s) and so it interchanges edges 
with subscripts differing by 2’. But this is precisely the effect of the 
transformation x — P;x. If tis an arbitrary vy + k tuple then the matrix 
describing the permutation g, is obtained by multiplying the appropri- 


ate matrices P;. For t = (to, th, ---, &42-1) we define 
M(t) = My+z-1 @ --- @ My @ Mo, (26) 
where 
_Jji if 4=0 
ae i if j=1. BD 


Note that the subscript order in (26) is the reverse of the subscript 
order in the vector t. We have now proved the following lemma. 


Lemma 4: If tis av + k tuple, then the permutation g;: x(s) — x(s + t) 
is represented by x — M(t)x. 

As an example, the permutation goi0 given in (12) is represented by 
the matrix P = I ® A @ ] given in (14). By Lemma 4 we may regard 
G,,, as the following group of matrices: 


Gi» = {Myrr1 @ --- @ M, @ Mo|M; = I or A, 
J=0,---,v+k-— 1}. (28) 


We shall prove that Q(S%) is a particular linear combination of 
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matrices M(t) in G,,. To calculate \,(Q(S%)) we need to work with 
eigenvectors and eigenvalues of the matrices M(t). 

The matrices M(t) are symmetric and they all commute; hence, they 
can be simultaneously diagonalized. Let H be the tensor product of 


vy +k copies of 
sb 
J2\1 —-1y 


Observe that H~! = H7™. Since (1, 1)” and (1, —1)? are eigenvectors of 
A, the columns of H are eigenvectors of M(t) for all vy + k tuples t. 
Thus, H*’M(t)H is diagonal for every matrix M(t) in G,,. If p= 
(Do, «++ ,Dy+k-1) iS a binary v + k tuple, define 


w(p) = W,+n-1 @ --- @ w; @ wo, 


where 


_ fan it p=o 
wm We -1)", if p=. a) 


The vectors w(p) are the columns of H. Note that w(p) is formed by 
reversing the vector p. We have A(1, 1)7 = (1, 1)? and A(1, —1)7 = 
—(1, -1)7. If t = (to, ---, tyx-1) then by (27) and (29) 


ptk-1 pt+k-1 
M(t)w(p) bd Mw; -( IT 1) w(p) 


(—1)?‘w(p), (30) 
where p-t is the dot product of the vectors p and t. 


Lemma 5: Suppose R is a diagonable matrix that commutes with every 
matrix M(t) in Gx. Then R is a linear combination of the matrices 
M(t) in Gp». 

Proof: If s, t are different v + k tuples, then by Lemma 1, g,(xo) # 
&:(%o). The permutation matrices M(t) are therefore linearly independ- 
ent because the 1’s in row 0 are in different positions. Thus we have 
2’** linearly independent diagonal matrices H~'M(t)H. Since R com- 
mutes with every matrix M(t), H~'RH commutes with every matrix 
H™M(t)H, and therefore H™'RH is diagonal. The matrices H™'M(t)H 
span the set of diagonal matrices so H~'RH is a linear combination of 
matrices H~!M(t)H and the lemma follows. 0 

Lemma 6: If S is a set of error events fixed by GZ, then (Sizes A(E)) is 
a linear combination of the matrices M(t) in Gz». 

Proof: The distance matrix A(E) of an error event is the sum of 
contributions from each component: 


A(E) = ¥ A-(E), (31) 
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where we sum over the components c of E. The restriction of GZ, to 
the edges in any component c is just the group G;,,. If edges x(s), x(s) 
appear in component c of error event E, then edges g,(x(s)), g:(x($)) 
appear in component c of error event EF’ = g,(E). We have 

A(E’) = M(t)’A(E)M(t). (32) 


Since M(t) is a permutation matrix and M(t)? = J, we have M(t)? = 
M(t)~*. Now g, merely permutes the error events in S, so that by (32), 
» A(E) = Y Alg(E)) = Y M(t) 'A(E)M(t) 

EES EES 


EES 


= M(t)" ( pa A.B) M(t) (33) 
Ees 


for all matrices M(t) and for all components c. Summing (33) over all 
components c finishes the proof. O 


Example: If S is the orbit of error events shown in Fig. 6 then 





=31@I@I-U@SI®GA+I@ASI+ABIOI). (34) 


Consider S%, the set of all error events E(a, z; b,, b¥) of minimal | 
length N = (k + v)/k. Recall that a = (a, --- , ay_;) is the initial state, 
z= (2%, +--+, Zn-1) is the final state, and b;, b¥ are the first pair of 


inputs. We have | S| =(5 ) Ph 


Lemma 7: 
(1) Let t = (to, ---, tn-1) and let t’ = (ti, «++, ty-1) where t;, i = 0, 1, 
-, N—1, is a binary k tuple. If gf = (g1, ga, «++, 8:8) © G#,, then 


g7(E(a, z5 bi, by)) = E(a + cS ae i, by + to, bt + to). (35) 
(2) The group Gi, partitions the set S% of error events of length N into 
2”(2" — 1) orbits each of size 2’**}. 
Proof: Part (1) follows from the definition of g,i given in (17). To 


verify part (2) we note that E(a, z; b,, bf) is fixed only by the symmetry 
gi, where b = (b, + bf, 0, 0, ---, 0). Hence, each orbit consists of 
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2’tk-! distinct error events. Since the total number of error events in 
SY ig (2”-2”(2")(2® — 1))/2, we see that there are 2’(2" — 1) orbits. O 
The orbit containing the error event E(a, z; b,, bf) is determined by 
a+ zand b, + bf. Setting f = b, + by, we denote this orbit by 
S(a + 2; f). This orbit contains E(a, z; 0, f); note that f # 0 because 
b, # b}. Recall that if f is a k tuple, then the v + k tuple (f0 --- 0)’ 
equals (yo, ¥1, -++, Yn-1) where y; = f and y; = 0 forj # i. 
Lemma 8: Let S% be the set of all error events of length N and let 
S(a + 2; f) be the orbit of Gx, containing the error event E(a, z; 0, f). 
Then 


N-1 
(1) 2’°**Q(S(a + 2; f)) = 2NIo* — 2 Y M((f0---0)') (36) 


i=0 


(2) 2°**(2* — 1)Q(S%) = 2(2* — 1) Nos 


N-1 
-2¥ Y M((fo---0)’). (87) 


f#0 i=0 


Proof: We calculate the contribution to Q(S(a + z; f)) made by pairs 
of edges in component 0. Since the restriction of Gf, to the edges in 
any component is just the group G,,,, this distance contribution is 


a 2 [ge(x(Oa, --- an-1)) — 8e(x( far --- an-x))) 
~ = D [x(t + (Oa, +++ an-1)) — x(t + (fay +++ ays) 
a a 2 » x(t)? -— 2 z x(t)x(t + (f0 --- op] 


x" [2Ior+s — 2M(fO --- 0)]x. 


= ere 
In general, the distance contribution made by edges in component i is 
1 
ork > [glx (en-i +++ 2n-10a, +++ an-i-1)) 
t 


— g(x(2n-i +++ 2n-1 fai --> an-i-1))]° 





1 
sz [x(t + (en-i +++ 2-100, +++ Gy-i-1)) 
t 





OF 
— x(t + (2zy-i +++ Zn-1 fai +++ an-i-s))? 
= = x7 [QI — 2M((fO --- 0)')]x. (38) 
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Summing (38) over all components i, we obtain (36). Since (36) is 
independent of a + z, we obtain the formula for Q(S%) by summing 
(36) over all nonzero k tuples f. O 


Remark: When k = 1, there is only one choice for f, namely f = 1, and 
so every form Q(S(a + z; f)) is equal to Q(S”). For k = 1, v = 2, we 
have 


Q(S(00; 1)) = Q(S(10; 1)) = Q(S(01; 1)) = Q(S(11; D) = Q(S*) 
= 1/48, -(I@I@A+I@ASI+ABI@QI)] 


[see the matrix given as (34)]. However, for k > 1, the form 
Q(S(a + z; f)) will change with f. Thus, for k = 2, vy = 4, we have 


Q(S(a + 2; 11)) = 1/32[8Ie, — (Is ® Ig @ (A @ A) 
+1,@(A@A)@OlL,+ (A ®@A)OL,8 i), 
while 
Q(S(a + z; 10)) = 1/32[8le, — (I, @ I, @ (I @ A) 
+1,®(1®A)9lL,+ (I @A)@OL,8 ,)]. 


Theorem 2: If k divides v, then the normalized minimum distance of 
any sliding window trellis code with 2” states and rate k bits/channel 


symbol satisfies 
d2 ; okt y 
OD ee ( ae 2), 





Po = 1 k 
Proof: From the proof of Theorem 1, we have 
2 


Se < 27 1Q(S)] 


= (Qn), (39) 


where Qy = (2® — 1)2’**Q(S*). Let ¢ = (co, --- , ¢n-1) be a binary vy + k 


tuple and let y be the number of nonzero k tuples c;. Then by (30), the 
eigenvalue of Qy associated with w(c) is 


a -WN-2 EY (1 


f#0 i=0 
N-1 
=2(2°= UN =2: 0 EDS (40) 
i=0 f¥#0 
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where we sum over all nonzero k tuples f. Since 
2-1, if ¢,=0 
—1\r7 — ? i 
2d a if c; # 0, 
eq. (40) becomes 
2(2* — 1)N — 2[(2* — 1)(N — y) — y] = 2**"4. (41) 


The largest eigenvalue of Qy is obtained when y = N = (1 + (v/k)). 
The theorem now follows from (39). 0 
Remarks: Observe that the largest eigenvalue of Q(S%) is associated 
with w(c), where co, C1, --*, Cy-1 are all nonzero. For example, with 
k =1, the largest eigenvalue, 4(1 + (v/k)) has multiplicity one and is 
associated with the eigenvector (1, —1)7 ® (1, -1)7 ® --- ® (1, —1)”. 
When k > 1, there will be several linearly independent eigenvectors 
associated with \;(Q(S* )) because there are several choices for c with 
all c; # 0. Also, note that Theorem 2 gives the same bound as Theorem 
1 when k = 1. For k = 2, the bound of Theorem 2 is an improvement. 
In Appendix B we prove that if y = (N — 1)k + 1 where 0 </1<k, 


then 
ad? in gk-H1 y 
P ea \ Fl RIP 


where L yJ denotes the integer part of y. 





V. A FINAL BOUND OBTAINED FROM A WEIGHTED AVERAGE 


Let S**! be the set of all error events of length N + 1 = 2 + (v/k). 
Let Q(S***) be the matrix obtained by averaging the distance matrices 
of all error events of length N + 1. In this section we derive a formula 
for Q(S**1) and we prove 


d?.. = g2k+l1 ie y 
PQ —4 k 


using a weighted average of Q(S%) and Q(S*?). 

An error event E of length N + 1 is determined by the initial state 
a = (a), «++, @n-1), the final state z = (21, --- , zy-1), the inputs },, bt 
at time 0, and the inputs be, b3 at time 1. Since the two paths diverge 
at time 0, we must have b, # by. To remerge at z the last N — 1 inputs 
must be the k tuples zy_, Zy-2, --- , 2; in that order. After N inputs 
the two paths occupy states z2 --- zy-1be and 22 --- zy-1b3. At this 
stage the two paths must be disjoint so b, # by. We denote this error 
event E by E(a, z; bi, bf; bz, b¥) [equivalently E(a, 2; by, by; bz, be)]. 

The group Gi, maps error events of length N + 1 to error events of 
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length N + 1. To be specific, let t = (to, ---, ty-1) be av + k tuple and 
set t’ = (th, eR sy tn-1), t” = (to, anes tn-2). If gr = (8, 8, aaa 1 &:N-, 
g,) then it follows from the definition of g,i given in (7) that 


8i(E(a, z; bi, by; be, bF)) 
= E(a + Ls, 2+ t”; b; + to, b¥ + to; be + tn-1, b¥ +- tn_1). (42) 


This group action does not preserve a + z but it does preserve b; + bf 
and bz + b3. Set g = b; + bi and f = b. + by. We denote the orbit of 
Gi, containing the error event E(a’, z’; 0, g; 0, f) by S(a’, 2’; g, f) (in 
the discussion above, a’ = (a), ---, @v-2, Gn-1 + bg) and z’ = (2, + bi, 
22, 48% y zn-1)). Note that i, g # 0. 
If f, g are k tuples, then the v + k tuple (fg0 --- 0)° = (fg0 --- 0) 
and (fg0 --- 0)‘ is obtained from (fg0 --- 0)’ iy cycling the ‘inchs 
-of k bits to the right and moving the last block to the front. Thus 
(fgO --- 0)? = (0 --- Ofg). Define matrices M,(fg0 --- 0), i = 
0, ---, Nin Gy, as follows: 


Mol fg0 «++ 0) = M(g0 -+- 0) 
Mi(fe0 --- 0) =M((fg0--- 0) i=1,---,N-1, 
My(fg0 --- 0) = M(0 --- Of). (43) 


Example: For k = 1, v = 2, the orbit S(00, 00; 1, 1) is shown in Fig. 7. 
The quadratic form Q(S(00, 00; 1, 1)) is given by 


Q(S(00, 00; 1, 1)) = : 





‘Lia: sueithadenibicseiak 
+A @I®@I1)) 


== (sr -2) } Mit10)). (44) 


i=0 


Lemma 9: Let S%* be the set of all error events of length N + 1 and let 
S(a, 2; g, f) be the orbit of Gz, containing the error event E(a, z; 0, g; 0, 
f). Then, 
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E£=E(00, 00; 0,1; 0,1} (9300 +9010-9001 9100) (E) 
Q O O O O Q Oo Oo 









oO re) oO 
re) oO Oo O 
©) [e) oO 0 0 0 90 


(9910-9001 9100-9010) (F) (9410-9011 +9101 +9110) (F) 
o Oo 0 0 0 





oO 





oO 0 9 90 


(9911 +9101 9110-9011) (E) (944494119111 9111) (E) 
0 0 0 0 0 





Fig. 7—The orbit S(00, 00; 1, 1). 


N 
(1) 2°**Q(S(a, z; g, f)) = 2(N + 1)Ion* — 2 » Mi(fg0 --- 0). (48) 
(2) 27**(2* — 1)?Q(S™) = 2(2* — 1)?(N + 1)lor 
N 
—2 2, X M;(fg0 --- 0). (46) 


Proof: We calculate the contribution to Q(S(a, z; g, f)) made by pairs 
of edges in component 0. This distance contribution is 


= x [x(t + (Oa, -++ ay-1)) — x(t + (ga; +++ ay-1))]? 


= = x [2Io+e — 2M(g0 --- 0)]x 
as found in the proof of Lemma 8. Similarly, the contribution made 
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by pairs of edges in component N (the last component of the error 
events) is 


a » [x(t + (21 «++ zn-10)) — x(t + (a1 +++ 2w-af))? 


= iy 22h — 2M --- Of). 





For 1 = 1, ---, N — 1, the contribution made by pairs of edges in 
component 1 is 


1 
gerk D [x(t + (2n-ina +++ 2n-1000; +++ an-i-1)) 
t 


ea x(t + (Zn-i+1 -2* ZN-1/8Q -°- ay-i-1))}° 


= se 2 » x(t)? — 2 x x(t)x(t + (fg0 --- o)| 


m oH x7 [2g — 2M((fg0 «+» 0) x. 


The sum of the contributions from all N + 1 components is 


N 
Q(S(a, 2; 8, f)) = wi x? 20 + Ifa — 2 X M;( fg0 - -- 0| x. 


This proves part (1). Observe that (45) is independent of a and z. We 
obtain Q(S%*") by summing (45) over all pairs g, f of nonzero k tuples. 
Since there are (2" — 1)? such pairs, 


(2* _ Lo hO(S)) 
N 
= 2(2* — 1)(N + 1)lo-2 Y Y Mi(fg0 --- 0) 


fg#0 i=0 
as required. 0 


Remarks: When k = 1, we must have f = g = 1 and so every form 
Q(S(a, z; g, f)) is equal to Q(S**"). In this case, N = 1+ v and 





Q(S**) = a2 + v)Ig — 2 y M;(110 --- 0| 


+1 
2” i=0 


[see the matrix given as (44)]. For k > 1, there are several choices for 
f and g. Thus, for k = 2, v = 4, we have, with g = (1, 1) and f = (0, 1), 


Q(S(a, z; 11, 01)) = 1/64[8l6, — 21, @ I, @ (A @ A) 
+1,@(A@A)® (AQT) 
+(A®BA)B(ASGTI)OL 
+ (A @J)@l, @ I4)], 
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while with g = (01) and f = (10) we get 
Q(S(a, 2; 01; 10)) = 1/64[8le6, — 2, © I, ® (A @ T) 
+1,®(A @I)®@ (I @A) 
+(A®I)®TI®A)OL, 
+(7®A) 81,8 ,)]. 


Theorem 3: If k divides v then the normalized minimum distance of any 
convolutionally derived trellis code with 2” states and rate k bits/channel 


symbol satisfies 
d2 : g2k+l y 
min < PPA eet 2 a 
Pp ~2k—4 ( ag ) 





Proof: If @ is any weighted average of Q(S%) and Q(S*?), then by (8) 
we have 


d2. 

> < 2”**),(Q). (47) 
Let 6 = 1/(2%* — 1). Then 2(2* — 1)6 + (2” — 1)76 = 1. Define Q to be 
the following weighted average of Q(S%) and Q(S4*"): 

@ = 2(2* — 1)5Q(S%) + (2% — 1)?6Q(S**). 
Set 
Qn = 2°**(2* — 1)Q(S%) 
and 
Quis = 2742 — 1)°Q(S**), 

Then by (47) 





d2iin 
Pp < 6A1(2Qn + Qn). (48) 
The eigenvectors, w(c), of Qy and Qn: are in 1-1 correspondence with 
binary vectors c = (ci, --- , Cy), where c;, 1 = 1, ---, N are k tuples. 
By (41) 
Qyw(c) = 2***y(c)w(c), (49) 


where y(c) is the number of nonzero k tuples c;. Introduce k tuples 
Co = Cn+1 = O and define 


a(c) = | {ilc; = 0, cin. 4 0 or c; ¥ O, Ci41 = 0}] 


and 
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B(c) = | {i]ce; # 0 and ci41 # O}]. (50) 
There are (N + 1) — a(c) — B(c) indices 1, 0 <1 < N, for which c; = 
Ci+1 = 0. By (30) the eigenvalue of Qy+1 associated with w(c) is 


22 — WN +1) —2 DY (cayenne 


f,g#0 i=0 
N 
= 2(2° -1?(N+1) - 2 ¥ (5 yr) ( Dy ye) (51) 
i=0 \f«0 &#0 


Recall that the sum ¥«o0 (—1)“/ is (2" — 1) when c; = 0, but equal to 
—1 whenever c; # 0. Hence (50) is equal to 


2(2* — 1)°(N + 1) — 2[((N + 1) — ac) — B(c))(2* — 1)? 
— a(c)(2* — 1) + B(o)] 
= 2(2* — 1)%(a(c) + B(c)) + 2(2* — La(c) — 28(c) 
= 2**1(2*(a(c) + B(c)) — 2(a(c) + B(c)) + a(c)) 
= 2*[2*(a(c) + B(c)) — (a(c) + 28(c))]. (52) 


Now, y(c) is the number of nonzero c,’s. Since each nonzero ¢; 
appears in two pairs, (c;-1, c;) and (c;, c:+1), we have 


a(c) + 2B8(c) = 2y(c). (53) 


Substitution in (52) shows that the eigenvalue in (51), of Qy+1 asso- 
ciated with w(c) is 


2**[2"(2y(c) — B(c)) — 2y(c)]. (54) 
By (49) and (54) we have 
(2Qn + Qni1)w(c) 
= 2**(2y(c) + 2*(2y(c) — B(c)) — 2y(c))w(c) 
= 2**\(2v(c) — B(c))w(c). (55) 


There are N — y(c) indices i, 1 <1 < N, for which c; = 0. Since every 
c;, 1 <j < N, appears in the two pairs (c;-1, c;) and (c;, ¢j+1), there are 
at most 2 + 2(N — y(c)) indices i, 0 <1 < N, for which c; = 0 or ¢41 = 
0. Hence 


B(c) = (N+ 1) —2 — AN — y(c)) = 2y(c) -N-1 


and 
2(y(c)) — B(c) S N+ 1. (56) 
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Now (48), (55), and (56) imply 


d?.. g2k+1 g2k+1 
a Soe hd) a (2+). Oo 





Remarks: Equality can hold in (56). If N is odd, set co = cg = --- = 
Cn+1 = 0 and ci, ¢3, ---, cy ¥ 0. Then y(c) = (N + 1)/2 and B(c) = 0. 
(Observe that for k = i y = 2, the largest eigenvalue of the form 
2Qn + Qn+i is associated with eigenvector (1, — 1)7 ® (1, 1)? ® 
(1, -1).) If N is even, set co = c3 = C5 = C7 = +++ = Cyx, = O and 
C1, C2, C4, Cg, +++, CN F O to get y(c) = (N + 2)/2 and @(c) = 1. Setting 


okt y g2k+1 
o(1+t)-o- (2 +2) 


yields v = k(2" — 1). If v< k(2* — 1), then Theorem 2 gives the stronger 
bound; if vy > k(2* — 1) then Theorem 3 gives the stronger bound. In 
particular, for k = 1, Theorem 3 gives a stronger bound for any v > 1. 

The bound given by Theorem 3 is obtained from the largest eigen- 
value of a particular weighted average of Q(S%) and Q(S*%*). In 
Appendix A we use the duality theorem of linear programming to 
prove that no other weighted average of Q(S%) and Q(S%*") gives a 
stronger bound. 

In Appendix B we prove that if vy = (N — 1)k + 1, where 0 </1<k, 


then 
ade g2(k-)+1 y 
p <pe0—q\2 +R) 


where LJ denotes the integer part of y. 








VI. CONCLUSIONS 


Three upper bounds on the normalized minimum distance, (d?iin/ 
P), have been given for trellis codes. The bound 


divin V 
< = 
wa (142) 


given in Theorem 1 is typical. This certainly provides nontrivial 
information. For example, is it possible to gain 10 dB in minimum 
distance using 2° = 64 states at rate 1 bit/symbol? The answer is no. 
Theorem 1 bounds the gain at 8.4 dB; Theorem 3 bounds the gain at 
7.3 dB. Nevertheless, there still remain the questions of how tight 
these bounds are and if they exhibit the “right” dependence on the 
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Table |—Possible gains at rate 


1 bit/symbol 
Upper Bounds 
Lower 
Bound Theorems 1 
y (Ungerboeck) and 2 Theorem 3 
2 2.5 db 4.7 db 4.3 db 
3 3 6.0 5.2 
4 3.4 7.0 6.0 
5 4.2 7.8 6.7 
6 4.5 8.4 7.3 
7 5.1 9.0 7.8 
8 5.3 9.5 8.2 
9 5.6 10.0 8.6 
10 5.8 10.4 9.0 
11 6 10.7 9.4 


parameters vy and k. For example, consider the normalized minimum 
distance for block codes of length n, having 2”* code words (k bits/ 
symbol). In that case, known upper bounds behave, for large n, like 
d?/P <s 2n/4*. Thus the linear dependence on », a quantity analogous 
to block length, appears correct. However, the true dependence on k 
may be different from our bound. Table I gives upper and lower bounds 
on the gain (in dB) that is possible at rate 1 bit/channel symbol. The 
lower bounds arise from codes constructed by Ungerboeck.* 

Also minimum distance is by no means the complete story with 
regard to error rate. The heuristics leading to the claim that terms 
involving din would dominate an upper bound on the error rate make 
the assumption that the infinite series determining the upper bound 
converges. Even if a code with a good d,;, were found, an upper bound 
on error rate should still be computed for that particular code. As an 
example of a catastrophe that may occur, consider the assignment of 
edge labels x7 = (1, —1, —1, 1, —1, 1, 1, —1) to the trellis of Fig. 1. One 
observes that a pair of edges leaving a node always contributes (1 — 
(—1))? = 4 to the distance and similarly for a pair of edges merging 
into a node. One immediately concludes that no error event has 
distance less than 8 for this edge assignment. Since P = 1, this is a 3 
dB gain over the uncoded +1 situation. How could this happen with 
only +1 symbols? One answer is that we forgot to include unmerged 
events, events which go on forever. We had implicitly assigned infinity 
to their distance, but now some have distance 4. However, this could 
be rectified by perturbing the +1 edge labels by small amounts. A 
more serious trouble with this code is that an infinite number of error 
events have (essentially) the minimum distance and so a coefficient 
that we did not explicitly consider turns out to be infinite for this 
particular code. 
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APPENDIX A 


The upper bound of Theorem 3 is obtained from the largest eigen- 
value of a particular weighted average of the quadratic forms Q(S¥%) 
and Q(S**), In this appendix we prove that no other weighted average 
gives a stronger bound. We shall assume throughout that v > k(2* — 
1) since the bound given in Theorem 3 improves upon that given in 
Theorem 2 only for v in this range. 

If r1, ro = 0 andr, + re = 1, then 

Dinin 


< 2”**)i(r Q(S*) + r2Q(S™")). 


Recall from (29) that the eigenvectors w(c) of Q(S*) and Q(S‘**) are 
1:1 correspondence with binary v + & tuples c. Let c = (ci, ---, cw), 
where c;,1 = 1, ---, Nis a binary k tuple and let co = cyi1 = 0. Recall 
that 


a(c) = |{i]c; = 0, ci41 # 0 or c; ¥ 0, Ci: = 0}, 
B(c) = |{t]c; # O and cj41 ¥ 0}, 
and 
y(c) = | tile; # O}]. 
Define ¢y(c) and @y+1(c) by 2’**(2* — 1)Q(S%)w(c) = ¢n(c)w(c) and 
artkok — 1)?Q(S*1)w(c) = oyu1(c)w(c). Then by (49) and (54) 
on(c) = 2***y(c) (57) 
and 
onsi(c) = 2°*[2*(2y(c) — B(c)) — 2y(c)]. (58) 


To find the optimal weighted average we have to solve the following 
linear programming problem. 

Choose real variables r,, r2, r= 0 so as to minimize r subject to the 
inequalities 


—(ri + re) <= 1 (59) 
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and 


én(c) dn+i(c) 








ry oF it” (pe "<% for all »+ktuplesc. 
In Theorem 3 we proved that a feasible solution to (43) is 
9 9k ae | p2k+1 y 
oe es a ey raga (2+5) = 


The linear program (59) is the dual of the primal linear program given 
below. 

Choose real variables a,, a = 0, where the index c runs through all 
binary v + k tuples, so as to maximize a subject to the inequalities 


oy (x (da) —az>0 
1 
@— 2 (z deal) —a20 


(3 2} >-1. (61) 


c 


If we can find a feasible solution to (61) with 


g2k+1 y 
a= ok 1 ( + *), 
then by the duality theorem of linear programming,” (60) is an optimal 
solution to (59). We consider two cases. 


Case 1. N odd 


Pick f= (fi, ---, fv), where f;,1=1, ---, Nisa binary k tuple and 
every f; is nonzero. Pick g = (g), ---, gn), where g;,1=1,---, Nisa 
binary k tuple and g; # 0 if and only if i is odd. Then y(f) = N, B(f) 
= N—-1 and ¥(g) = (N + 1)/2, B(g) = 0. By (57) and (58), éy(f), 
én(8), dnei(f), and ¢n+i(g) are as follows: 






2h(N + 1) 
2r9K(N +1) —2N] | 2*[(2*-1)(N + D] 
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Set 





9e—~1\(N+1 : 
+1) \N-1) on 
oe A ets _ 2N — 2") if c= 
t= 4 DIN — 1)’ ~ 8 
0, otherwise, 
g2kt1 p 


Direct calculation shows that (62) is a feasible solution to (61). (Since 
vy = k(2* — 1), the variables a, are all nonnegative.) 


Case 2. N even 


Pick h = (hi, ---, hy), where h;,i = 1, ---, N, is a binary R tuple, 
hy = hs = hy = hg = Tae = hy_-, = 0, and hy, hg, ha, he, he, --+, hy are 
nonzero. Then y(h) = (N + 2)/2, B(h) = 1 and, by (57) and (58), 


on(h) = 2°(N +2) and dysi(h) = 2"*1[(2* — 1)(N + 2) — 2°], 
Set 


(2*-~1)N-2 if c=f 
(2 + 1)(N — 2)’ 7 
a: = _ 2(N — 2") 
‘ora TN—ay «= ome 
0, otherwise, 
92k+1 


Direct calculation shows that (63) is a feasible solution to (61). (Again 
since v > k(2" — 1), the variables a, are all nonnegative.) 
We have now shown that (60) is an optimal solution to (59). 


APPENDIX B 


In this appendix we extend Theorems 1, 2, and 3 to the case when 
k does not divide v. Setting vy = (N —1)k + l, where 0 <1< k, we have 
N = L(v + k)/kJ where L yJ denotes the integer part of y. 

Encoder states are labelled with binary v tuples in the way described 
in Section II. Edges of the trellis are labelled with real numbers <x(s), 
where s is a binary vy + k tuple. The group G,, is defined in the way 
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described in Section II; for each binary v + k tuple t, we define a 
permutation of the edge labels x(s) by the rule 


&:(x(s)) = x(s + ft). 
The symmetry g* is the sequence 


gr = (g., &, 8, 2% -), 


where t° = ¢ and t' is obtained from t‘! by cycling the entries k bits 
to the right and moving the last k bits to the front. When k = 2 and 
p= 8, 


* 
8t1001 = (811001, 01110, 810011) 811100) Zoo111) £1101; ***)- 


In general, t' = t?*' where d = (k + v)/gcd(k, k + v). Given any 
component of the trellis and any pair of edges x(s), x(t) in that 
component there is a unique element of G3, interchanging x(s) and 
x(t). The proof of Theorem 1 goes through without change and we 
have 


d fin 
“Po 


where No is the minimal length of an error event. 

To see that No = N, consider an error event FE with initial state 
(time t = 0) a= (a; --- Gy-1€n), where qj, --- , @y-1, are k tuples and 
ay is an / tuple. If k tuples b,, bf are input at time 0 then at time 1 the 
two paths occupy states (bi, a), --- , @n—2, Gn-1) and (bj, ay, --- , an-1, 
Gn-1), where § denotes the / tuple (s; --- s;) obtained from the k tuple 
(s; «++ S,) by deleting the last k — | bits. At time 1 the k-tuple zyc is 
input to both paths, where c is a fixed but arbitrary k — | tuple. At 
time 2 the two paths occupy states (zyc, 1, ai, --- , @v-3, Gn-2) and 
(znc, bi, a1, «++ , @n-3, Gn-z2). At time N, after inputs zy-3, --- , 22, the 
two paths occupy states (22, 23, «++ , Zn-1, ZnC, 61) and (22, 23, «++ , ZN-1; 
zyc, b*). If 6, = b* then the two paths remerge at time N in state 
z= (20, 23, «++, Zn-1, ZnC, 6). We denote this error event by E(a, z; bi, 


bf). Thus by (64) 

dain v 

Pp <4 ( + =) (65) 
for general k and ». 


Let S(a, z; b,, bf) be the orbit of Gz, containing the error event 
E(a, z; b,, bf). We calculate the contribution to Q(S(a, z; b;, b{)) made 
by pairs of edges in component 0 in the same way as Lemma 8. Setting 


< 4No, (64) 
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f = b; + b{ this distance contribution is 
1 
om >» [ge(x(biai +++ an-1aw)) — ge(x(bja, +--+ an—ray))/? 
t 


1 
gutk x 





*[2Iore — 2M(fO --- 0)]x. 


Similarly, the distance contribution made by pairs of edges in com- 
ponent N — 1 is 


oi x72Iore — 2M((fO «+» 0)%-)]x. 


Note that the first / bits of f are zero and that the last / bits of 
(f0 --- 0)! are zero for 0 < i < N — 1. Arguing as in Lemma 8 we 
obtain 

N-1 


2”*FQ(S(a, 2; bi, bt)) = 2NIor» — 2 Y M((fO --- 0). 
i=0 


There are (2*' — 1) k tuples f for which f # 0 and f= 0. Hence 


N-1 
atk! — 1)Q(S%) = 2(2*" — 1)NIoe —-2 YY Y M((fO --- 0)4. 
f#0 i=0 
js. 
Setting Q = 2’t*(2*"' — 1)Q(S*) we obtain 
daa 1 
ip =, grt z (@), (66) 


which reduces to (39) when / = 0. The proof of Theorem 2 goes through 
(change “c; = 0(#0)” to “the last k — / digits of c; are zero (nonzero)”) 


i(Q) = 2° N, (67) 


ra be gk-l+1 
P << gel 1 ( + +) (68) 


By (66) and (67) 





for general k and p. 


Finally we consider the set S’ of all error events of length N + 1 for 
which the k tuples b;, b{ input at time 0 satisfy 6, = b* = 0 and for 
which the k tuples b2, b§ input at time 1 satisfy 6. = 5% = 0. Let 
E €S’ with initial state a = (a, ---, ay-1, an) and final state z = (21, 

, Zn-1, Zn), where aj, 2;,1 = 1---, N—1 are k tuples and ay, zy 
are | tuples. At time 2 the two paths occupy states ((zy0 --- 0) + bo, 
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bi, G1, °*°+, An-3, Gn-2) and ((zn0 re 0) a bs, bi, Q1, °**, an-3, Gn-2). 
At time N the two paths occupy states (22, 23, ---, 2n-1, (gv0 --- 0) 
+ bo, b:) and (22, 23, ---, 2n-1(znO --- 0) + b¥, D¥). Set f = by + dF 
and g = b; + bi and define M;( fg) 1 = 0, --- Nas in (48). Arguing as 
in Lemma 9 we obtain 


grrr) (grt 7 1)QS , ) 


N 
= 2(2*' — 1)(2*" — 1)(N + 1)lox—-2 Y Y Y Milfg0--- 0). 
f#0 g#0 i=0 


f=0 g=0 


Let 6 = 1/(27*” — 1) and let Q = 2(2"' — 1)6Q(SY) + (2** — 1)76Q(S’). 
Then 2(2"-' — 1)6 + (2' — 1)?6 = 1 and so 

dans 7 

Pp —< Q”tky (Q). 
The proof of Theorem 3 goes through [change “c; = 0 (40)” to “the 
last k — | digits of c; are zero (nonzero)”] and we obtain 


de... g2(k-)+1 y 
?> < ed 1 (: + 2) (69) 


for general k and pv. 
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